harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Collect fastp stats fails for samples with multiple runs per BioSample #193

Closed tsackton closed 1 month ago

tsackton commented 1 month ago

We aggregate multi-run fastp output just using cat. However, with the switch to using json output for fastp, this produces invalid json when there are multiple runs for a biosample.

Therefore, we need an alternate way to do the fastp aggregation. Probably, this means rewriting the collectFastpStats function but open to other ideas.

Thanks to @Erythroxylum (Dawson White) for uncovering this.

cademirch commented 1 month ago

Ah. That's an oversight on my part. I can take care of it

Erythroxylum commented 1 month ago

Hi @cademirch and @tsackton , another error has occurred for printBamSumStats:

RuleException: TypeError in file /n/holyscratch01/davis_lab/dwhite/cocawgs/snpArcher/workflow/rules/common.smk, line 512: string indices must be integers, not 'str' File "/n/holyscratch01/davis_lab/dwhite/cocawgs/snpArcher/workflow/rules/sumstats.smk", line 60, in __rule_collect_sumstats File "/n/holyscratch01/davis_lab/dwhite/cocawgs/snpArcher/workflow/rules/common.smk", line 512, in printBamSumStats

Log file attached. 32761527.log

tsackton commented 1 month ago

@Erythroxylum can you pull the code I just merged and try again? I think this will fix both errors.