wodanaz / Assembling_viruses

0 stars 0 forks source link

Adds script to stage data and run pipeline #15

Closed johnbradley closed 3 years ago

johnbradley commented 3 years ago

Adds run-dds-escape-variants.sh sbatch script that will

  1. download a DDS project with *.fastq.gz files in the base directory
  2. run the escape-variants-pipeline on the *.fastq.gz files
  3. upload the results to a new DDS project
  4. delete input data

The output DDS project name is created by adding "_results" to the input ddsproject name. Users of this script must first setup their ddsclient credentials.

Help for run-dds-escape-variants.sh looks like so:

Runs a Slurm pipeline determining escape variants in fastq.gz files, staging data from/to DDS.

usage: ./run-dds-escape-variants.sh -g genome -d datadir -i inputproject
options:
-g genome        *.fasta genome to use - required
-d datadir       directory used to hold input and output files - required
-i inputproject  project name to download - required

NOTE: The input genome must first be indexed by running ./setup-variants-pipeline.sh.
NOTE: The genome and datadir must be shared across the slurm cluster.

The datadir will have an input and output subdirectories created. The input subdirectory will hold downloaded DDS projects in project specific subdirectories. The output subdirectory will hold results and logs in project specific subdirectories.

See #12 for more discussion.


The script can be run with sbatch and notify completion via email like so:

sbatch run-dds-escape-variants.sh -g <genome.fasta> -d <assembling_results_dir> -i <ddsproject>

To be notified via email you can add the sbatch email flags like so:

sbatch --mail-type=END --mail-user=<email> run-dds-escape-variants.sh ...

Fixes #12

johnbradley commented 3 years ago

@wodanaz I created an example project in DDS named sars-cov2-example and gave you permissions. It contains a small fastq.gz file downloaded from https://registry.opendata.aws/ncbi-covid-19/. From this branch you could test it like so using the directory you mentioned in #12:

sbatch run-dds-escape-variants.sh -g MT246667.fasta -d /data/wraycompute/alejo/sars2_genotype/assembling_results -i sars-cov2-example
wodanaz commented 3 years ago

@wodanaz I created an example project in DDS named sars-cov2-example and gave you permissions. It contains a small fastq.gz file downloaded from https://registry.opendata.aws/ncbi-covid-19/. From this branch you could test it like so using the directory you mentioned in #12:

sbatch run-dds-escape-variants.sh -g MT246667.fasta -d /data/wraycompute/alejo/sars2_genotype/assembling_results -i sars-cov2-example

Ok... one question.

For this to work properly, do I have to clone the github repository to the following directory?

data/wraycompute/alejo/sars2_genotype/assembling_results

wodanaz commented 3 years ago

ok I think, it's working now

johnbradley commented 3 years ago

For this to work properly, do I have to clone the github repository to the following directory? data/wraycompute/alejo/sars2_genotype/assembling_results

You should not need to.

wodanaz commented 3 years ago

For this to work properly, do I have to clone the github repository to the following directory? data/wraycompute/alejo/sars2_genotype/assembling_results

You should not need to.

yeah, I didn't have the genome but it seems to be working now.

Fantastic work!!! Thank you.

I have learned a lot about automation of scripts.

wodanaz commented 3 years ago

I think its good for now. I will start cranking some data soon.

Will let you know how it goes or if I have any questions!