rhysnewell / aviary

A hybrid assembly and MAG recovery pipeline (and more!)
GNU General Public License v3.0
76 stars 11 forks source link

threads #174

Closed wwood closed 7 months ago

wwood commented 7 months ago

Hey,

Just lost ~1 days' time not realising that -n doesn't imply changes to -p. Lesson learned, but wondering how we could better document for first time users. It was a particular issue for aviary assemble, since I think the short_read_assembly rule is going to take the vast majority of time.

Maybe add threads / mem into the example at the bottom of the help, and/or add more explicit links between -p and -n in the argparse help?

wwood commented 7 months ago

Actually even my comment above was wrong. -p is not a parameter - though strangely using it doesn't result in a command line parsing error.

If -t is greater than -n, then should -n be automatically bumped up to equal -n? That way (assuming -t >=16) the user can just specify -t only as a simple control on the threads.

rhysnewell commented 7 months ago

-p is pplacer threads. I thought it used to automatically do that, but I guess not anymore. No reason it shouldn't be implemented during arg parsing

wwood commented 7 months ago

I thought it used to automatically do that,

You mean

If -t is greater than -n, then should -n be automatically bumped up to equal -n? That way (assuming -t >=16) the user can just specify -t only as a simple control on the threads.

?

aljazdzy commented 3 weeks ago

What exactly was the thread issue here? I've had aviary running for like 65 hours now and I see it's only using a single thread even though I specified it to use all available- is there a way to alleviate this?

rhysnewell commented 3 weeks ago

This was to do with a confusing component of the CLI and adding in parallel processing of singlem and some rosella components. It sounds like you've hit a bottleneck though, can you provide a few more details?

Which step is aviary currently stuck on? How many resource did you provide? Were you running assembly and binning?

aljazdzy commented 2 weeks ago

The only output it gave (it did eventually trigger an error and fail- I believe it likely hit the ram cap) was for QC for assembly - nanplot, fastqc, polishing, and assembly log indicates to me that it finished assembly but was never able to assess things like community abundance as there is no file from singlem or anything beyond assembly/polishing. I was trying to assemble and bin - I gave it 16 CPUs and 240 GBs of ram - but aside from early on it only ever used a single core. I'm trying to find the exact command I gave but it seems to have disappeared a bit, but I was trying to use the "complete" command. I might just try recover on its own (as I have already assembled files, I just wanted to see if it could all go through one single workflow.)
I'm going through the command list - do you need to specify threads for each specific step? Because if-so that may have been where I erred.

aljazdzy commented 2 weeks ago

And I realize this is a separate issue but it's a question I have- if I have a metagenome assembly for input into recover- does it have to be a scaffolded assembly? If I can avoid the issues with assembly it would save me time, but the only assembly input option appears to require a scaffolded assembly.