wodanaz / Assembling_viruses

0 stars 0 forks source link

Remove duplicated hard coded values #7

Closed johnbradley closed 3 years ago

johnbradley commented 3 years ago

The pipeline has some duplicated hard coded values.

For example MT246667 is in many locations such as:

https://github.com/wodanaz/Assembling_viruses/blob/ebbb2bf0526d567a1bce766b6b32d8f236fd9f06/scripts/index-reference-genome.sh#L12

https://github.com/wodanaz/Assembling_viruses/blob/ebbb2bf0526d567a1bce766b6b32d8f236fd9f06/scripts/apply-bqsr.sh#L17

wodanaz commented 3 years ago

@johnbradley, yeah this line calls for the genome reference which is something that can be stored in memory or in the path at the beginning..... I was thinking in something like GENOME=MT246667.fasta..... and then it should be called with $GENOME right?

johnbradley commented 3 years ago

@johnbradley, yeah this line calls for the genome reference which is something that can be stored in memory or in the path at the beginning..... I was thinking in something like GENOME=MT246667.fasta..... and then it should be called with $GENOME right?

That sounds like a good fix. Would you want to pass this in as an argument? Something like this:

./run-escape-variants.sh --genome=MT246667.fasta
wodanaz commented 3 years ago

yes, indeed! that would be a great argument. because also depending on the experiment. we can use a different genome reference.

wodanaz commented 3 years ago

One argument that would be nice to add at some point is a coverage thresholding parameter. You know... that file table.sort.tab is a table with a column that reports the percentage of the genome that is covered by at least one read. It would be nice to say that only mapped libraries with >90% coverage get their consensus sequence. But we can worry about this later since Nico is testing a new kit that may increase the percentage of genome coverage.

johnbradley commented 3 years ago

This issue has been fixed by #11