Closed pawanchk closed 5 months ago
Hi @pawanchk, Please can you share the full command you used and the log output and system information?
Kind regards, Rich
Hi Rich,
Thank you for your response.
This is the command I used -
~/Dorado/dorado-0.5.2-linux-x64/bin/dorado trim -v sample.pass.bam > sample.pass.trimmed.bam
sample.pass.bam
is obtained from wf-basecalling workflow using the pod5
file as input, do let me know if you need more details on that.
This is the log of dorado trim
-
[debug] > adapter/primer trimming threads 231, writer threads 25
[info] > starting adapter/primer trimming
[debug] Processed 0 reads
[debug] Processed 0 reads
[debug] Processed 0 reads
[debug] Processed 0 reads
/var/spool/pbs/mom_priv/jobs/6185766.pbs101.SC: line 15: 1289837 Segmentation fault (core dumped) ~/Dorado/dorado-0.5.2-linux-x64/bin/dorado trim -v sample.pass.bam > sample.pass.trimmed.bam
I used these settings for the PBS script -
#PBS -l select=1:ncpus=12:mem=128G
#PBS -l walltime=12:00:00
This is the system Info -
CPU : AMD EPYC 7713
OS : RHEL 8.4 (Ootpa)
Please let me know if any more information is needed.
Does you command run locally - i.e. not in your PBS cluster?
No, I ran it in the PBS cluster using these settings in my PBS script
#PBS -l select=1:ncpus=12:mem=128G
#PBS -l walltime=12:00:00
Can you run dorado trim
on the data locally? I'm trying to deduce if the error lies with dorado, the data or the system.
Kind regards, Rich
Hi @HalfPhoton
I tried running it locally but that also ends in segmentation fault, please see below for details of the command used and the log msg -
$ ~/Dorado/dorado-0.5.2-linux-x64/bin/dorado trim -v sample.pass.bam > sample.pass.trimmed.bam
[2024-03-06 09:38:51.971] [debug] > adapter/primer trimming threads 116, writer threads 12
[2024-03-06 09:38:52.086] [info] > starting adapter/primer trimming
[2024-03-06 09:38:52.087] [debug] Processed 0 reads
[2024-03-06 09:38:52.087] [debug] Processed 0 reads
Segmentation fault (core dumped)
Hi @pawanchk - can you run with -vv
to get a more detailed log?
Hi @tijyojwad
I tried running with -vv
, please see the log below, the only additional log line with this parameter is Checking adapter/primer LSK109
$ ~/Dorado/dorado-0.5.2-linux-x64/bin/dorado trim -vv sample.pass.bam > sample.pass.trimmed.bam
[2024-03-19 10:53:41.742] [debug] > adapter/primer trimming threads 116, writer threads 12
[2024-03-19 10:53:41.902] [info] > starting adapter/primer trimming
[2024-03-19 10:53:41.902] [debug] Processed 0 reads
[2024-03-19 10:53:41.902] [debug] Processed 0 reads
[2024-03-19 10:53:41.902] [trace] Checking adapter/primer LSK109
Segmentation fault (core dumped)
great, looks like you're able to reproduce very easily. can you share this sample.pass.bam
file?
Hi @pawanchk are you able to share the file?
Hi @tijyojwad sorry, I missed your msg earlier
I am not able to share the data file due to data privacy issue however I downloaded a test sample data from Nanopore open datasets (https://labs.epi2me.io/tutorials/), I am going to try the trimming for this dataset and let you know how it goes.
Hi @tijyojwad
Following up on my previous msg earlier today, I processed the open data (gm24385_2020.09
) in Epi2Me labs (https://labs.epi2me.io/tutorials/), I ran dorado trim
the same way that I used for the sample I have and the same error persists, please see below -
This is the file that I used - gm24385_2020.09/analysis/r9.4.1/20200914_1354_6B_PAF27096_e7c9eae6/guppy_v4.0.11_r9.4.1_hac_prom/align_unfiltered/calls2ref.bam
This is how I ran dorado trim -
~/Dorado/dorado-0.5.2-linux-x64/bin/dorado trim -v calls2ref.bam > calls2ref.trimmed.bam
This is the error log -
[2024-04-03 16:03:59.022] [debug] > adapter/primer trimming threads 231, writer threads 25 [2024-04-03 16:11:15.273] [info] > starting adapter/primer trimming [2024-04-03 16:11:15.274] [debug] Processed 0 reads [2024-04-03 16:11:15.274] [debug] Processed 0 reads [2024-04-03 16:11:15.274] [debug] Processed 0 reads [2024-04-03 16:11:15.274] [debug] Processed 0 reads /var/spool/pbs/mom_priv/jobs/6563201.pbs101.SC: line 14: 2770327 Segmentation fault (core dumped)
Any insights on how to resolve this issue will be very helpful.
Hi @pawanchk, Do you also have this issue in Dorado 0.6.0 which was released this week?
Hi @HalfPhoton Thanks for your response.
I tried the open data (gm24385_2020.09) in Epi2Me labs (https://labs.epi2me.io/tutorials/) with the latest release of Dorado v0.6.0
, it ran successfully without any error/segmentation fault.
This is the top and bottom part of the log -
[2024-04-04 11:06:05.794] [info] Running: "trim" "-v" "~/gm24385_2020.09/analysis/r9.4.1/20200914_1354_6B_PAF27096_e7c9eae6/guppy_v4.0.11_r9.4.1_hac_prom/align_unfiltered/calls2ref.bam" [2024-04-04 11:06:05.794] [debug] > adapter/primer trimming threads 231, writer threads 25 [2024-04-04 11:16:58.588] [info] > starting adapter/primer trimming [2024-04-04 11:17:09.466] [debug] Processed 50000 reads [2024-04-04 11:17:22.027] [debug] Processed 100000 reads [2024-04-04 11:17:34.914] [debug] Processed 150000 reads [2024-04-04 11:17:40.062] [debug] Processed 200000 reads [2024-04-04 11:17:42.442] [debug] Processed 250000 reads [2024-04-04 11:17:46.432] [debug] Processed 300000 reads [2024-04-04 11:17:52.207] [debug] Processed 350000 reads . . . . [2024-04-04 11:32:39.776] [debug] Processed 4600000 reads [2024-04-04 11:32:43.772] [debug] Processed 4650000 reads [2024-04-04 11:32:45.982] [debug] Processed 4700000 reads [2024-04-04 11:32:49.879] [debug] Processed 4750000 reads [2024-04-04 11:32:51.966] [debug] Processed 4800000 reads [2024-04-04 11:32:56.254] [debug] Processed 4850000 reads [2024-04-04 11:32:59.209] [debug] Processed 4900000 reads [2024-04-04 11:33:01.794] [debug] Total reads processed: 4938711 [2024-04-04 11:33:01.935] [info] > Simplex reads basecalled: 3454633 [2024-04-04 11:33:01.935] [info] > finished adapter/primer trimming
It also worked successfully for the sample data I have.
But one surprising thing I noticed is the bam file size after trimming is much bigger (almost twice the size) - I observed this in both the open data and the sample data I have. I would expect the file size to be reduced since it trims part of the reads. Please let me know your thoughts on this.
Hi @pawanchk - that looks like a configuration bug in the dorado trim
application. Instating of outputting BAM we're outputting SAM, as a result the output size is larger. We'll get this fixed in the next release. in the meantime if you run the output of trim
through samtools view -b
you should get a smaller file size. Sorry about that!
Hi @pawanchk - a fix for the output to be BAM was merged a couple of weeks ago and is available is both v0.6.2 (release 2 weeks ago) and v0.7.0 (released this week).
Hi @tijyojwad
Thanks so much for the update, appreciate it a lot.
Hi,
I tried trimming adapters using Dorado using the post-basecalling
bam
file usingdorado trim
but it ends in this error -Segmentation fault (core dumped).
I tried Dorado version
0.5.2+7969fab
.Can I please know how I can resolve this ?