Open vjp98982 opened 3 years ago
1) Version of linux on the teaching cluster:
Operating System: CentOS Linux 7 (Core) Kernel: Linux 3.10.0-1127.13.1.el7.x86_64
2) Which directories are in your $PATH variable when you login to the teaching cluster?
~$ echo "${PATH//:/$'\n'}" /home/vjp98982/miniconda/bin /home/vjp98982/miniconda/condabin /usr/local/bin /usr/bin /usr/local/sbin /usr/sbin /opt/puppetlabs/bin /opt/apps/slurm/prod/bin /usr/tools/bin /home/vjp98982/miniconda/bin /home/vjp98982/miniconda/bin /home/vjp98982/.local/bin /home/vjp98982/bin
3) Explain what the following BASH command is doing (explain what both executables do, what the option flags mean, and how data is being transferred at each step).
curl -s ftp://ftp.ensemblgenomes.org/pub/bacteria/release-37/gff3/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.37.gff3.gz | gunzip -c > ecoli_MG1655.gff
What the executables are doing:
curl is an executable for pulling down the data stored at the ftp address.
The "-s" flag stands for silent mode. It stops curl from presenting a progress bar as it pulls down the .gz file from the ftp address.
The pipe between the first two statements is taking the output of curl (the file that was pulled down) and sending it to the gunzip program to be extracted/unzipped.
gunzip -c writes the output of gunzip to stdout which is then redirected to the file ecoli_MG1655.gff
We are essentially pulling down compressed data and uncompressing it on our machines into a file that we can work with.
a) The number of CDS features: 4141
b) A URL to the location of the script on github: https://github.com/vjp98982/GENE8940/blob/8fb6f79af131a773b31ca7844221c23ab44d4a50/homework_1.sh
c) The git revision of the script used for this analysis. 8fb6f79af131a773b31ca7844221c23ab44d4a50
/home/vjp98982/homework_1
as your output directory according to the script you posted. We usually save all scripts in the Github repo in home directory (you should have a clone on teaching cluster /home/vjp98982/GENE8940
), while all data and outputs in /work/gene8940/vjp98982
. Therefore, the preferred way to set OUTDIR is OUTDIR="/work/gene8940/vjp98982/homework_1"
.@vjp98982 Perfect job! Please feel free to let me know if you have any other questions.
Goals for this assignment: 1) gain experience with working on the cluster/issuing simple bash commands 2) make a script to determine number of coding sequence features (for use on the cluster). Push and pull the script from github and run it on the teaching cluster