vincentpennetti / GENE8940

0 stars 0 forks source link

Homework_1 #1

Open vjp98982 opened 3 years ago

vjp98982 commented 3 years ago

Goals for this assignment: 1) gain experience with working on the cluster/issuing simple bash commands 2) make a script to determine number of coding sequence features (for use on the cluster). Push and pull the script from github and run it on the teaching cluster

vjp98982 commented 3 years ago

1) Version of linux on the teaching cluster: Operating System: CentOS Linux 7 (Core) Kernel: Linux 3.10.0-1127.13.1.el7.x86_64

vjp98982 commented 3 years ago

2) Which directories are in your $PATH variable when you login to the teaching cluster?

~$ echo "${PATH//:/$'\n'}" /home/vjp98982/miniconda/bin /home/vjp98982/miniconda/condabin /usr/local/bin /usr/bin /usr/local/sbin /usr/sbin /opt/puppetlabs/bin /opt/apps/slurm/prod/bin /usr/tools/bin /home/vjp98982/miniconda/bin /home/vjp98982/miniconda/bin /home/vjp98982/.local/bin /home/vjp98982/bin

vjp98982 commented 3 years ago

3) Explain what the following BASH command is doing (explain what both executables do, what the option flags mean, and how data is being transferred at each step).

curl -s ftp://ftp.ensemblgenomes.org/pub/bacteria/release-37/gff3/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.37.gff3.gz | gunzip -c > ecoli_MG1655.gff

What the executables are doing:

curl is an executable for pulling down the data stored at the ftp address.

The "-s" flag stands for silent mode. It stops curl from presenting a progress bar as it pulls down the .gz file from the ftp address.

The pipe between the first two statements is taking the output of curl (the file that was pulled down) and sending it to the gunzip program to be extracted/unzipped.

gunzip -c writes the output of gunzip to stdout which is then redirected to the file ecoli_MG1655.gff

We are essentially pulling down compressed data and uncompressing it on our machines into a file that we can work with.

vjp98982 commented 3 years ago

a) The number of CDS features: 4141

b) A URL to the location of the script on github: https://github.com/vjp98982/GENE8940/blob/8fb6f79af131a773b31ca7844221c23ab44d4a50/homework_1.sh

c) The git revision of the script used for this analysis. 8fb6f79af131a773b31ca7844221c23ab44d4a50

JingxuanChen7 commented 3 years ago
  1. Correct.
  2. Correct.
  3. Correct.
  4. Correct. Only one tiny suggestion here: It seems that you set /home/vjp98982/homework_1 as your output directory according to the script you posted. We usually save all scripts in the Github repo in home directory (you should have a clone on teaching cluster /home/vjp98982/GENE8940), while all data and outputs in /work/gene8940/vjp98982. Therefore, the preferred way to set OUTDIR is OUTDIR="/work/gene8940/vjp98982/homework_1".

@vjp98982 Perfect job! Please feel free to let me know if you have any other questions.