read_classification_wf:kraken2 fails on Mac

replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data

https://case-group.github.io/

GNU General Public License v3.0

39 stars 17 forks source link

read_classification_wf:kraken2 fails on Mac #200

Closed CSynodinos closed 2 years ago

CSynodinos commented 2 years ago

I have been running the pipeline on my Mac and everything works fine, including the download_database_kraken2 step, but it consistently fails on read_classification_wf:kraken2. Same issue has been replicated on other macs as well

hoelzer commented 2 years ago

Hey @CSynodinos , thanks for your interest in the pipeline!

What's the error that you get on Mac? Either printed to the terminal or you can check the content of the .nextflow.log file in the corresponding work directory where the kraken2 process is started (see terminal output on the left in the same row of the kraken2 process; smt like [91/73rf32r3] and then check cat work/91/73rf32r3*/.nextflow.log)

replikation commented 2 years ago

I assume that you don't have enough RAM to load the database. especially on a normal desktop PC, Kraken will fail.
but we set the error to "ignore" so it's still proceeding with the other less demanding steps.
it could also be that you are creating too many processes in parallel, therefore, consuming too much RAM
as noted in the readme 8 GBRAM was not enough for kraken2 to run

So i think everything works as intended

CSynodinos commented 2 years ago

Hi @replikation, I have 32 gb of Ram on my macbook pro. Also, I can run the pipeline to completion on an ubuntu vm with half the ram so I don't think that's the issue. The message that I get is: NOTE: Process read_classification_wf:kraken2 (1) terminated with an error exit status (137) -- Error is ignored

CSynodinos commented 2 years ago

Another issue that I have is that after the error is ignored, the create_summary_report_wf:summary_report_default step doesn't run. Is that step dependent on the output of kraken2?

replikation commented 2 years ago

exit 137 is related to not having enough RAM. what is porecov telling you regarding cpus and max_cpus usage? it could be that you are running too many Kraken runs in parallel thus demanding more RAM.?
otherwise, we need more info as stated in the issue or we are blind on our end.

CSynodinos commented 2 years ago

I'm currently running the pipeline through a python script with the following command: subprocess.call([f"nextflow run replikation/poreCov --fastq_pass {fastq} -r 0.11.0 --medaka_model r941_min_fast_g507 --minLength 100 --primerV V1200 --output {cwd} --cores 4 --rapid TRUE -profile local,docker"], shell = True). The input fastq is a fastq.gz file thats 66.2 MB.

Parameters: Medaka model: r941_min_fast_g507 [--medaka_model] Min depth nucleotide: 20 [--min_depth] Latest Pangolin/Nextclade?: false [--update] CPUs to use: 4 [--cores] Memory in GB: 12 [--memory]

I have also attached the .log file nextflow.log

CSynodinos commented 2 years ago

I also run the pipeline with the test dataset and got the same error on the mac but was successful on Ubuntu.

replikation commented 2 years ago

i think it has something to do with docker and mac. is docker able to utilize all the ram? because if this is restricted or too low you get the same, not enough RAM errors. docker runs pretty much natively on ubuntu so that is why you don't have issues there as it's able to dynamically allocate RAM to the container. I don't know if it's the same docker desktop as windows but for older versions, you need to specify how much RAM a container can use.

In any way its an issue outside the scope of poreCov and related to mac and docker.

CSynodinos commented 2 years ago

I can confirm that it was docker, it had a default RAM of 2 GB. Thank you for your help everyone :)