Closed eperezv closed 7 months ago
It looks to me the error hints to another problem:
Caused by:
Process requirement exceeds available memory -- req: 128 GB; avail: 125.6 GB
Could you use "--max_memory 120.GB" ?
I saw it, yes. I didn't try "--max_memory 120.GB" yet. But I thought it made no sense to require 128 GB for that task with a few thousand ASVs. I monitored RAM use when crashing and it was not even using 50 GB RAM. After removing a few (ca 100) ASVs < 50nt, it works perfectly.
I have tried with the "--max_memory 120.GB" option. Same error (but it tried several times, that was the difference).
Sorry for the late reply, wasnt working for a week.
Some info might be helpful for troubleshooting:
As far as I understood ITSx is generating sub-sequences of ASVs, therefore filtering ASVs by length before cutting the ITS region might not help here. You could tests whether --its_partial 50 (50 is an example here!) helps (but that might have unwanted side-effects, such as allowing more sequences to pass).
One step further would be to supply ITSx (i.e. process ITSX_CUTASV
) with the exact parameters that you want it to have. Append to your command to run the pipeline -c itsx.config
where itsx.config
contains:
process {
withName: ITSX_CUTASV {
ext.args = '-t all --preserve T --date F --positions F --graphical F --minlen 50'
}
}
(that contents are by default when using --cut_its full
, plus --minlen 50
which should omit any short sequences and therefore might solve your issue).
Please let me know how these approaches work for you in order to further improve pipeline reliability.
About:
I also have a feature suggestion, which is allowing the user to choose the taxonomic group in ITSx. I modified the code manually because I am only interested in amplicons related to fungi.
Forgot to mention that modifying the pipeline code invalidates all my troubleshooting and I cannot support personal code-altered copies of this pipeline. I hope that isnst the proble here though (If yes I made a mistake to sink time into this...). To make that right, use a config as above to modify -t all
to the value you are interested in. Using configs is fine, altering the code makes me not able to help.
Forgot to mention that modifying the pipeline code invalidates all my troubleshooting and I cannot support personal code-altered copies of this pipeline. I hope that isnst the proble here though (If yes I made a mistake to sink time into this...). To make that right, use a config as above to modify
-t all
to the value you are interested in. Using configs is fine, altering the code makes me not able to help.
The problem was already there before any code modification. I edited it later while trying to understand the pipeline and the issue
Hi, I have finally tried what you proposed:
process {
withName: ITSX_CUTASV {
ext.args = '-t all --preserve T --date F --positions F --graphical F --minlen 50'
}
}
Same error, I see reads shorter than 50 nt, which are not removed before taxonomic assignation
Hi there, that is a pity that this wasnt the solution. Again questions that you didnt answer yet:
Some info might be helpful for troubleshooting: exact command that started ampliseq, if using any configs with parameter, please supply those pipeline version nextflow version
Other than that I am out of ideas, maybe @jtangrot has more ideas?
Hi! Unfortunately, the --minlen option in ITSx will not have any effect in this case, it only applies when concatenating regions. To filter the output the option I know of is to use --partial (--its_partial in ampliseq). However, as Daniel mentioned, that will also allow partial matches to the region, which might not be desirable. I'm a bit surprised that ITSx outputs that short ITS2 regions, but if that is the case I wonder why the taxonomy assignment fails with short sequences. I guess that could in theory also happen with other ASVs?
I wonder why the taxonomy assignment fails with short sequences. I guess that could in theory also happen with other ASVs?
Yes, as far as I remember, it happened already before that short ASVs were stalling the DADA2 classifier, but it was possible to circumvent the issue by filtering for ASV length. This solution seems not applicable in this case.
So how can we solve this? Would there be a filtering needed between the ITSx output and taxonomic classification? If so, could that be directly in the process or should we add a separate process?
After a little research, I found an github issue for DADA2 where indeed the threshold of 50bp is mentioned: https://github.com/benjjneb/dada2/issues/601
. In https://github.com/benjjneb/dada2/issues/326
is a similar issue for the reference taxonomy, which is set to min 20bp.
I have added a length filter after ITSX that removes any sequences below 50bp. The threshold of 50 can be changed with a config file, if desired. This change is in the dev branch and will be in the next release.
Hi, I'm sorry that I was busy and couldn't be active on the discussion. I have tried the latest version and it completely solves the issue. Thanks a lot!
Description of the bug
Hello, I was processing my ITS data using ampliseq and I found a problem. The log can be found below.
I think the problem is directly related to the option
I have checked many things and I think the problem is caused by ITSX, which produces a few very short ASVs (<50 nt) that crash R during taxonomic assignation. I don't know why ITSx is producing that very short ASVs (I think it shouldn't, or there should be a way of filtering them out). I have tried to remove short ASVs with the --min_len_asv but it seems it works at other step and not right before taxonomic assignation.
My solution for now run ampliseq, wait for it to crash. Then, manually remove short ASVs and re-run ampliseq.
I also have a feature suggestion, which is allowing the user to choose the taxonomic group in ITSx. I modified the code manually because I am only interested in amplicons related to fungi.
This is the log:
Command used and terminal output
No response
Relevant files
No response
System information
No response