hasindu2008 / f5c

Ultra-fast methylation calling and event alignment tool for nanopore sequencing data (supports CUDA acceleration)
https://hasindu2008.github.io/f5c/docs/overview
MIT License
137 stars 28 forks source link

f5c eventalign out of memory message #166

Closed nickst85 closed 1 month ago

nickst85 commented 2 months ago

Hi,

I am trying to run eventalign on reads from RNA004 and I get the following error: [slow5_idx_init::INFO] Index file not found. Creating an index at '/pathto/sample.blow5.idx'. [slow5_idx_build::ERROR] Failed to allocate memory: Cannot allocate memory At src/slow5_idx.c:276 [sig_handler::ERROR] I regret to inform that a segmentation fault occurred. But at least it is better than a wrong answer.

Any idea why this is happening?

The .blow5 is 96.6GB

Thank you,

Nick

hasindu2008 commented 2 months ago

Hi @nickst85

Is this on a cluster you are running through a batch submission system or on a computer directly? -If it is a batch system, how much RAM have you requested? -If you are running on a computer directly, how much RAM do you have on this system?

Can you run slow5tools index /path_to_/sample.blow5 and see if the same error persists?

nickst85 commented 2 months ago

Hi Hasindu,

I use the HPC and I allocated 800Gb but I get the same error both on f5c and on slow5tools. The erron on slow5tools is the one below:

[slow5_idx_build::ERROR][1;31m Failed to allocate memory: Cannot allocate memory[0m At src/slow5_idx.c:189 [segv_handler::ERROR][1;31m I regret to inform that a segmentation fault occurred. But at least it is better than a wrong answer.[0m [src/main.c(segv_handler:77)::DEBUG] Here is the backtrace: /lib64/libc.so.6(+0xcee73)[0x14bf42d38e73] /lib64/libc.so.6(+0x91c91)[0x14bf42cfbc91] /lib64/libc.so.6(fread+0x9b)[0x14bf42cf0d0b] slow5tools(+0x42e30)[0x55a99156ae30] slow5tools(slow5_idx_to+0x46)[0x55a99156be46] slow5tools(slow5_idx_create+0x32)[0x55a991561992] slow5tools(_Z10index_mainiPPcP12program_meta+0x37a)[0x55a99153c6fa] slow5tools(main+0x7b7)[0x55a991534447] /lib64/libc.so.6(__libc_start_main+0xe5)[0x14bf42ca4d85] slow5tools(+0xc681)[0x55a991534681] [0m

Let me know if you can understand what’s going on.

Best,

Nick

From: Hasindu Gamaarachchi @.> Date: Monday, June 17, 2024 at 11:58 PM To: hasindu2008/f5c @.> Cc: Settas, Nikolaos @.>, Mention @.> Subject: [EXT] Re: [hasindu2008/f5c] f5c eventalign out of memory message (Issue #166) ATTENTION: External Email! Do not click attachments/links unless sender is known.


Hi @nickst85 [github. com] Is this on a cluster you are running through a batch submission system or on a computer directly? -If it is a batch system, how much RAM have you requested? -If you are running on a computer directly, how much RAM

Hi @nickst85 [github.com]https://urldefense.com/v3/__https:/github.com/nickst85__;!!FMDslA!ZmqIVMrcO5fxY-fzv6zZ-T-yp--mUlbmnLMTSMa4qwIeRlgtgR23Wm3zGgUQesKLvTP6eeN1gyp8wVvGJqyXWUw3WSZ9fJo$

Is this on a cluster you are running through a batch submission system or on a computer directly? -If it is a batch system, how much RAM have you requested? -If you are running on a computer directly, how much RAM do you have on this system?

Can you run slow5tools index /pathto/sample.blow5 and see if the same error persists?

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/hasindu2008/f5c/issues/166*issuecomment-2174938917__;Iw!!FMDslA!ZmqIVMrcO5fxY-fzv6zZ-T-yp--mUlbmnLMTSMa4qwIeRlgtgR23Wm3zGgUQesKLvTP6eeN1gyp8wVvGJqyXWUw3bghURog$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/A5QGITQIOML4ZZAUISAFBV3ZH6V6ZAVCNFSM6AAAAABJOVMMJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZUHEZTQOJRG4__;!!FMDslA!ZmqIVMrcO5fxY-fzv6zZ-T-yp--mUlbmnLMTSMa4qwIeRlgtgR23Wm3zGgUQesKLvTP6eeN1gyp8wVvGJqyXWUw3LvkQJ_I$. You are receiving this because you were mentioned.Message ID: @.***>

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

hasindu2008 commented 2 months ago

Even 16 GB should be enough for indexing, so it seems to be a bug or some error occurred when this BLOW5 file was created. Would you mind running slow5tools quickcheck and seeing if it complains that the file is not in good shape?

And to see if it is something caused due to an unaccounted difference in your system, could you also download the file https://gtgseq.s3.amazonaws.com/misc/sub/ont-r10-5khz-dna/PGXXXX230339_reads_20k.blow5 and see if the slow5tools index succeeds? I am trying to narrow down the search space to see where the problem lies in.

nickst85 commented 2 months ago

Hi Hasindu,

The slow5tools quickcheck gave me the following:

[main] cmd: slow5tools quickcheck sample.blow5

[main] real time = 0.095 sec | CPU time = 0.011 sec | peak RAM = 0.006 GB

When I run the index of the file you sent me, I got the following:

slow5tools index PGXXXX230339_reads_20k.blow5

[main] cmd: slow5tools index PGXXXX230339_reads_20k.blow5

[main] real time = 0.454 sec | CPU time = 0.382 sec | peak RAM = 0.010 GB

And it made the following file PGXXXX230339_reads_20k.blow5.idx 1.1Mb

Let me know if you have any more questions.

Best,

Nick From: Hasindu Gamaarachchi @.> Date: Wednesday, June 19, 2024 at 10:05 AM To: hasindu2008/f5c @.> Cc: Settas, Nikolaos @.>, Mention @.> Subject: [EXT] Re: [hasindu2008/f5c] f5c eventalign out of memory message (Issue #166) ATTENTION: External Email! Do not click attachments/links unless sender is known.


Even 16 GB should be enough for indexing, so it seems to be a bug or some error occurred when this BLOW5 file was created. Would you mind running slow5tools quickcheck and seeing if it complains that the file is not in good shape? And to see

Even 16 GB should be enough for indexing, so it seems to be a bug or some error occurred when this BLOW5 file was created. Would you mind running slow5tools quickcheck and seeing if it complains that the file is not in good shape?

And to see if it is something caused due to an unaccounted difference in your system, could you also download the file https://gtgseq.s3.amazonaws.com/misc/sub/ont-r10-5khz-dna/PGXXXX230339_reads_20k.blow5 [gtgseq.s3.amazonaws.com]https://urldefense.com/v3/__https:/gtgseq.s3.amazonaws.com/misc/sub/ont-r10-5khz-dna/PGXXXX230339_reads_20k.blow5__;!!FMDslA!fyGHtP8jjTkBMtKcebVvQA8BFvdgFoy6uLpsZm-1K-CQ5PSZ1ksQ3exEcrph6GLMmNES7WjIIel5-phYpkTL7rRl8lGxwsQ$ and see if the slow5tools index succeeds? I am trying to narrow down the search space to see where the problem lies in.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/hasindu2008/f5c/issues/166*issuecomment-2178805019__;Iw!!FMDslA!fyGHtP8jjTkBMtKcebVvQA8BFvdgFoy6uLpsZm-1K-CQ5PSZ1ksQ3exEcrph6GLMmNES7WjIIel5-phYpkTL7rRlpjl9hR0$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/A5QGITT6LVJNJHF4N2HNIVTZIGFZ5AVCNFSM6AAAAABJOVMMJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZYHAYDKMBRHE__;!!FMDslA!fyGHtP8jjTkBMtKcebVvQA8BFvdgFoy6uLpsZm-1K-CQ5PSZ1ksQ3exEcrph6GLMmNES7WjIIel5-phYpkTL7rRlUD_AzU4$. You are receiving this because you were mentioned.Message ID: @.***>

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

hasindu2008 commented 2 months ago

OK, so both of those commands have worked. May I know how this BLOW5 file was created? Is this a publicly available dataset that I can download and try to reproduce on my side?

nickst85 commented 2 months ago

Hi Hasindu,

This blow5 file was created post nanopore directRNA seq. I run the blue-crab p2s pod5_files -d blow5_files Then I concatenate all the single blow5 files for the sample to a merged single sample.blow5 with cat command. That blow5 file is 96.6Gb

No the data are not published yet.

Best,

Nick

From: Hasindu Gamaarachchi @.> Date: Wednesday, June 19, 2024 at 10:34 AM To: hasindu2008/f5c @.> Cc: Settas, Nikolaos @.>, Mention @.> Subject: [EXT] Re: [hasindu2008/f5c] f5c eventalign out of memory message (Issue #166) ATTENTION: External Email! Do not click attachments/links unless sender is known.


OK, so both of those commands have worked. May I know how this BLOW5 file was created? Is this a publicly available dataset that I can download and try to reproduce on my side? — Reply to this email directly, view it on GitHub [github. com],

OK, so both of those commands have worked. May I know how this BLOW5 file was created? Is this a publicly available dataset that I can download and try to reproduce on my side?

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/hasindu2008/f5c/issues/166*issuecomment-2178866340__;Iw!!FMDslA!d875uiagndP-1FXBf9RqSCBxRxt_AWU6HyvaEOo6OcZpX1d98UUHDn_ASnvemsEG_kFvxI0_iaIRG3cASAB4j0ZaDlDnrR8$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/A5QGITUCEGFALBCF7BPE4OTZIGJDHAVCNFSM6AAAAABJOVMMJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZYHA3DMMZUGA__;!!FMDslA!d875uiagndP-1FXBf9RqSCBxRxt_AWU6HyvaEOo6OcZpX1d98UUHDn_ASnvemsEG_kFvxI0_iaIRG3cASAB4j0ZaVN91bP4$. You are receiving this because you were mentioned.Message ID: @.***>

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

nickst85 commented 2 months ago

Hi Hasindu,

I indexed another sample that the merged blow5 file was 85Gb and it indexed it successfully and the f5c eventalign command run without errors. I don’t know if that helps.

Best,

Nick From: Hasindu Gamaarachchi @.> Date: Wednesday, June 19, 2024 at 10:34 AM To: hasindu2008/f5c @.> Cc: Settas, Nikolaos @.>, Mention @.> Subject: [EXT] Re: [hasindu2008/f5c] f5c eventalign out of memory message (Issue #166) ATTENTION: External Email! Do not click attachments/links unless sender is known.


OK, so both of those commands have worked. May I know how this BLOW5 file was created? Is this a publicly available dataset that I can download and try to reproduce on my side? — Reply to this email directly, view it on GitHub [github. com],

OK, so both of those commands have worked. May I know how this BLOW5 file was created? Is this a publicly available dataset that I can download and try to reproduce on my side?

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/hasindu2008/f5c/issues/166*issuecomment-2178866340__;Iw!!FMDslA!d875uiagndP-1FXBf9RqSCBxRxt_AWU6HyvaEOo6OcZpX1d98UUHDn_ASnvemsEG_kFvxI0_iaIRG3cASAB4j0ZaDlDnrR8$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/A5QGITUCEGFALBCF7BPE4OTZIGJDHAVCNFSM6AAAAABJOVMMJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZYHA3DMMZUGA__;!!FMDslA!d875uiagndP-1FXBf9RqSCBxRxt_AWU6HyvaEOo6OcZpX1d98UUHDn_ASnvemsEG_kFvxI0_iaIRG3cASAB4j0ZaVN91bP4$. You are receiving this because you were mentioned.Message ID: @.***>

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

nickst85 commented 2 months ago

Hi Hasindu,

I merged the single blow5 files with slow5tools merge and f2c indexed them. I have another question. Many read were not included into the analysis because they didn’t reach the minimum mapping quality of 20 that is the pre-set value. I want the eventalign output for m6Anet tool. Do you know how lower can I set the min-mapq ?

Thank you in advance,

N

From: Hasindu Gamaarachchi @.> Date: Wednesday, June 19, 2024 at 10:34 AM To: hasindu2008/f5c @.> Cc: Settas, Nikolaos @.>, Mention @.> Subject: [EXT] Re: [hasindu2008/f5c] f5c eventalign out of memory message (Issue #166) ATTENTION: External Email! Do not click attachments/links unless sender is known.


OK, so both of those commands have worked. May I know how this BLOW5 file was created? Is this a publicly available dataset that I can download and try to reproduce on my side? — Reply to this email directly, view it on GitHub [github. com],

OK, so both of those commands have worked. May I know how this BLOW5 file was created? Is this a publicly available dataset that I can download and try to reproduce on my side?

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/hasindu2008/f5c/issues/166*issuecomment-2178866340__;Iw!!FMDslA!d875uiagndP-1FXBf9RqSCBxRxt_AWU6HyvaEOo6OcZpX1d98UUHDn_ASnvemsEG_kFvxI0_iaIRG3cASAB4j0ZaDlDnrR8$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/A5QGITUCEGFALBCF7BPE4OTZIGJDHAVCNFSM6AAAAABJOVMMJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZYHA3DMMZUGA__;!!FMDslA!d875uiagndP-1FXBf9RqSCBxRxt_AWU6HyvaEOo6OcZpX1d98UUHDn_ASnvemsEG_kFvxI0_iaIRG3cASAB4j0ZaVN91bP4$. You are receiving this because you were mentioned.Message ID: @.***>

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

hasindu2008 commented 2 months ago

Hi @nickst85

Running successfully on another dataset shows that it is a very specific edge case with that other dataset.

May I ask you to see if you do a slow5tools merge instead of slow5tools cat, if that find out which file generated by blue-crab p2s is the problematic one. slow5tools cat simply concatenates files without parsing, so it is possible that if one of the files generated by blue-crab had an issue, it has propagated to the large file. Could you please save the log file (standard error) of the slow5tools merge and send me? I want to see if there is any warning or other information that can help me pinpoint the issue. Do you happen to have the log file generated by blue-crab and slow5tools cat?

About the --mapq, you can even lower it to 0. The authors tested that and they confimed that it works well with mapq 0 (see the comment from m6a authors at https://github.com/hasindu2008/f5c/issues/154#issuecomment-1960747253)

hasindu2008 commented 2 months ago

@nickst85 Did you have any luck with the merge?

hasindu2008 commented 1 month ago

I am closing this issue as there isn't a response. Feel free to reopen if you need any further help with it.