Porechop step gets killed for big fastq.gz file dataset

iqbal-lab-org / Mykrobe_tb_workflow

A workflow for analysis and resistance profiling of Mycobacterium tuberculosis nanopore data with Mykrobe

7 stars 3 forks source link

Porechop step gets killed for big fastq.gz file dataset #5

Closed shashibioinfo143 closed 4 years ago

shashibioinfo143 commented 4 years ago

Dear Micheal,

porechop step gets killed after 2 hrs start of analysis. I have tried to do Analysis for 28 Gb fastq.gz of M.Tb reads data on PC with 16 Gb RAM, 1Tb storage. I have tried to reinstall the pipeline and do analysis 5 times but still it shows the same error(PF below error). can you please help me to resolve the issue.

Thank you

bash: line 1: 3889 Killed porechop --input data/basecalled/Test_15_pass.fastq.gz --output data/porechopped/Test_15_pass.fastq.gz --threads 1 --check_reads 25000 --extra_end_trim 10 --discard_middle --format fastq.gz > logs/porechop.log** [Mon Oct 21 16:52:12 2019] Error in rule porechop: jobid: 7 output: data/porechopped/Test_15_pass.fastq.gz log: logs/porechop.log (check log file(s) for error message) shell: porechop --input data/basecalled/Test_15_pass.fastq.gz --output data/porechopped/Test_15_pass.fastq.gz --threads 1 --check_reads 25000 --extra_end_trim 10 --discard_middle --format fastq.gz > logs/porechop.log (exited with non-zero exit code)

mbhall88 commented 4 years ago

Hi @shashibioinfo143 . Yes, this is a known issue. Unfortunately, porechop is very RAM hungry. I still don't have a solution for this sorry.
Hopefully, in the next couple of months, I will have a solution. You could try splitting your fastq up into smaller pieces and running the porechop on the smaller pieces and then combining them back together afterwards. It's not a great solution but it's the only thing I can think of.

shashibioinfo143 commented 4 years ago

Thanks for the quick reply. Can we do Mykrobe Analysis Skipping the Porechop step?

mbhall88 commented 4 years ago

Is your sample multiplexed?

shashibioinfo143 commented 4 years ago

No it's singleplex sequencing using Minion nanopore sequencing.

mbhall88 commented 4 years ago

Ah ok. Well, then you could probably skip the pore chop step. To do this you would need to copy your fastq file into a porechop directory to trick the snakemake pipeline into thinking that porechop ran successfully. From the root of the project directory run the following

mkdir -p data/porechopped
cp data/basecalled/Test_15_pass.fastq.gz data/porechopped/Test_15_pass.fastq.gz

Then try rerunning the pipeline. It should then start running minimap2 and doing some plots and stats.

shashibioinfo143 commented 4 years ago

I have followed above steps and did the analysis. Porechop step is skipped. But the system got freezed at pre-filtering step for 3 hrs and process gets killed. Can you please help me with this. Thanks

mbhall88 commented 4 years ago

Hmmm, could you send me the output from snakemake prior to when it was killed?

mbhall88 commented 4 years ago

I should also mention. If it is only the mykrobe results you are interested in you don't need to run this whole pipeline necessarily. You could just run mykrobe directly with you FASTQ file?

shashibioinfo143 commented 4 years ago

pistis

shashibioinfo143 commented 4 years ago

Sorry im sending the screenshot of the error as the PC freezes and shutdown

shashibioinfo143 commented 4 years ago

currently im trying to do analysis with https://github.com/Mykrobe-tools/mykrobe

mbhall88 commented 4 years ago

Ah, so the problem in the above screenshot is the plotting is now using up a lot of memory. I have just added some changes to the pipeline so that the plotting should (hopefully) use a lot less memory now.

If you run git pull from inside the project directory that should bring in the latest changes for you.

shashibioinfo143 commented 4 years ago

Ah, so the problem in the above screenshot is the plotting is now using up a lot of memory. I have just added some changes to the pipeline so that the plotting should (hopefully) use a lot less memory now.

If you run git pull from inside the project directory that should bring in the latest changes for you.

Thanks for the update. I will let you know once i have done with analysis. currently im doing sequencing for other Sample In MinION attached to same workstation. once its done i will do Analysis with Mykrobe_Workflow tool

shashibioinfo143 commented 4 years ago

hi i tried to reinstall it and run but the pipeline shows below error TypeError in line 11 of /media/fmr/9af4a3cb-07fd-489a-85f8-9e2253be3d37/Mykrobe_tb_workflow-master/rules/reports.smk: must be str, not int

File "/media/fmr/9af4a3cb-07fd-489a-85f8-9e2253be3d37/Mykrobe_tb_workflow-master/Snakefile", line 62, in

File "/media/fmr/9af4a3cb-07fd-489a-85f8-9e2253be3d37/Mykrobe_tb_workflow-master/rules/reports.smk", line 11, in thanks

mbhall88 commented 4 years ago

@shashibioinfo143 see #6

shashibioinfo143 commented 4 years ago

Thanks for help