wodanaz / Assembling_viruses

0 stars 0 forks source link

Provide instructions for running individual sbatch scripts #25

Closed johnbradley closed 3 years ago

johnbradley commented 3 years ago

Some of the sbatch scripts require environment variables to be set before running directly. Document these requirements or make them optional.

johnbradley commented 3 years ago

Five things are required to rerun any single step from escape-variants-pipeline.sh

Working Directory

All the scripts called from escape-variants-pipeline.sh are meant to be run from within a intermediate/temporary directory. This behavior is to be consistent with Escape_Variants.md. So you must first cd into a directory to hold your intermediate files.

When running run-escape-variants.sh with the -d debug flag the intermediate/temporary directory will be saved even if the pipeline completes successfully. The intermediate/temporary will have a random name like tmp.XVSRFWFTFZ. So for this given temp directory you would cd like so:

cd tmp.XVSRFWFTFZ

Environment Variables

The path to a genome file, path to fastq.gz directory, path to the scripts directory, and path to the logs directory must all be set in environment variables. These should all be exported something similar to:

export GENOME=/path/to/MT246667.fasta
export INPUTDIR=/path/to/inputfastqdir
export EVBASEDIR=/path/to/Assembling_viruses
export EVSCRIPTS=$EVBASEDIR/scripts/
export LOGDIR=/path/to/save/logs

Running Commands

Then you should be able to directly run the lines from escape-variants-pipeline.sh. For example: https://github.com/wodanaz/Assembling_viruses/blob/04d64e7a5ebdfa4408a716533fb6e576da2b2fd8/scripts/escape-variants-pipeline.sh#L25-L30 To run the steps just copy and paste the commands them into your terminal like so:

ls ${INPUTDIR}/*.fastq.gz > reads.list
$EVSCRIPTS/sbatch-array.sh $EVSCRIPTS/remove-nextera-adapters.sh reads.list
johnbradley commented 3 years ago

@wodanaz The logic in the scripts requires running them from within a intermediate/temporary directory. This brings about the need for an environment variable like $EVSCRIPTS so the path to the rest of the scripts/perl can be determined. We could simplify the process of running individual steps by having the scripts assume we are running from the root directory of this repo. The scripts would receive the intermediate/temporary directory as an argument or environment variable. Thoughts?