huishenlab / biscuit

BISulfite-seq CUI Toolkit
Other
16 stars 7 forks source link

[Issue] Biscuit QC doesn't run / parallel error #32

Closed semenko closed 1 year ago

semenko commented 1 year ago

I don't think the Biscuit QC script runs as is -- I tried patching this in #31 -- would you consider merging?

(Would love an updated bioconda release with your recent awesome changes as well 😄 )

jamorrison commented 1 year ago

Hi @semenko,

Thanks for your issue and bug fix. Just to confirm, the QC script worked for you with -j6, just not as advertised with 6 cores. If so, then I'll go ahead and merge your PR.

As for the updated release, you can expect a release sometime this month. I have a major change that I'm working through for the epiread subcommand and once that's ready (it's almost there), I'll go ahead and craft a release and push that to bioconda.

semenko commented 1 year ago

Thanks! No -- the QC script did not work for me with -j6 at all (using parallel version 20221122).

It looks like GNU parallel required 6 job arguments but it only received two -- so it stalls/errors.

e.g. this works:

$ echo "whatever" | parallel -j2 -k --pipe --tee {} ::: \ "echo 1" \ "echo 2"
1
2

but this fails:

$ echo "whatever" | parallel -j3 -k --pipe --tee {} ::: \ "echo 1" \ "echo 2"
parallel: Error: --pipe/--pipepart must have a command to pipe into (e.g. 'cat').
semenko commented 1 year ago

(I wonder if this is the same issue PR #29 alludes to. Perhaps it makes sense to remove the GNU parallel dependency entirely, and just spawn those jobs to the background, plus a simple wait in QC.sh)

jamorrison commented 1 year ago

I confirmed this issue and merged your PR. I'd guess that was the issue alluded to in #29 also.

I've found this portion of QC.sh (depth stats) to be horribly slow and it easily takes the largest fraction of time when running the whole script. I've gone back and forth on whether to farm this out to another tool (mosdepth or something similar) or rework the current code to improve runtime. Ideally, either option would avoid using GNU parallel.