Closed hoelzer closed 3 years ago
large input files (500MB-1GB) are working with virsorter and pprmeta. I will test the other tools.
However, the r_plot process takes much time and seems even not be able to terminate for some files. Besides, the visualization is not really usefull for large input sets so I will deactivate it in a separate branch for my test runs.
Update: Virfinder finished after 18h for one of the large input files (~500 MB) fasta.
Now testing Marvel
so you dropped all the 29 metagenomes with > 1mio contigs (for each sample) on it? :D okay interesting... ill try out a few things and report back
yeah... I thought the EBI cluster is huge so just go for it WtP! :D
At the moment I am just running one sample with the -resume option adding more and more tools (currently Marvel is running).
marvel is super difficult to implement here. as its analysing "bins" by default. so i need to split each contig into a separate fasta file. and you have 1-2 mio contigs per file
uff I see. Maybe skipping Marvel if too many contigs are provided? I mean, it's just due to how Marvel is implemented and not reallt an issue of WtP
yep i was thinking about an "autoconfig" depending on the "assemblystats" of the input e.g. to many contigs -> deactivate tool x and y -> contig to large -> deactivate deepvirfinder etc.
I think that is a good idea and report back to the user what was deactivated and why.
these issue information are for #47
This issue is for documentation of the behavior of WtP for large input files. Based on this @replikation might implement FASTA chunking to increase speed of the pipeline.
case 1, aquadiva sample
(execluded metaphinder because of an previous issue)
started: Dec 31 12:50
Tools completed
Job was aborted after 2.5 days by cluster for unclear reason. No stats for deepvirfinder and marvel