Closed johnbradley closed 3 years ago
One difference to note is I made the spike filtering regular expression more rigorous. The regex from https://github.com/wodanaz/Assembling_viruses/issues/38#issuecomment-807288367 looks like this:
...grep -E '22812|22813|22917|23012|23063|23403|23592|23593'...
The code checks that it starts at the beginning of the line and checks up to a word boundary:
https://github.com/wodanaz/Assembling_viruses/blob/b033628da4eeab7d80198c77b5563180083bd2e7/scripts/intersect-spike.sh#L20
For example 1228121
would have been included in the original grep. The current values in spike.bed prevent such a large value, so the regex difference is just to be a more robust check. This filtering would likely be simpler to express in python, but I wanted to limit the new python changes to the spreadsheet generation for this PR.
Adds a step to generate a supermetadata table and filter reads based on spike.bed. Adds
-m mode
and-D date.tab
command line arguments removing-s
. Produces a spreadsheet joining genotype and supermetadata table. Adds new conda requirements to create a xlsx spreadsheet file.Details
Supermetadata
New script supermetadata-modify-titles.sh generates a supermetadata table based on https://github.com/wodanaz/Assembling_viruses/issues/38#issue-839017413.
Spike Filtering
New scripts: intersect-spike.sh, run-spike-genotype-compiler.sh, run-spike-depth-compiler.sh perform spike intersection filtering and genotype/depth compiling on https://github.com/wodanaz/Assembling_viruses/issues/38#issuecomment-807288367.
The run-bcftools-query-alt-ad.sh, run-genotype-compiler.sh, and run-depth-compiler.sh scripts have been removed since their logic is now handled by the new
*spike*.sh
scripts.Spreadsheet
New script: create-spreadsheet.py creates a spreadsheet joining genotype and supermetadata table.
Note
This PR includes a commit that expands memory requirements for some steps that were killed for using too much memory in testing this code.
Fixes #38