philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
284 stars 41 forks source link

segmentation fault using ngmlr 0.2.7 #94

Open lentil824 opened 3 years ago

lentil824 commented 3 years ago

Hi there,

I installed precompiled ngmlr 0.2.7, copied the demo command for Nanopore run, and replaced with my own ref.fa and whole genome seq.fastq.gz files. It immediately gave "segmentation fault" error.

I don't have permission type issue (#26), and ngmlr version not the old one (#10). I even zcat the gz file, copied the ref.fa to the current dir, none was working. Always the same error. Did I miss something?

Thanks, Susan

Chriswinefield commented 3 years ago

Hi Susan, I have also just installed this version of ngmlr and am experiencing the same issues - I was just wondering if you have made any progress or have moved to another approach in the meantime?

Regards Chris

lentil824 commented 3 years ago

Hi Chris,

Sorry for the late reply! Unfortunately I couldn't figure this out and didn't get a reply from the developer, so I switched to other mappers (https://github.com/lh3/minimap2 and https://github.com/marbl/Winnowmap). Both minimap2 and winnowmap are easy to install and use, so you can give them a try.

Regards, Susan

Chriswinefield commented 3 years ago

Thanks Susan,

I'm in the same boat and have gone to minimap2. Other issues have arisen with my collaborators with them providing the PacBio reads in a format that is now causing other issues - I really love bioinformatics at times 😐. Thanks for the note - Much appreciated

Kind Regards Chris

Chris Winefield Associate Professor in Plant Genomics and Molecular Biology

Department of Wine Food and Molecular Biosciences Faculty of Agriculture and Life Sciences

RFH, room 062 Engineering Drive P O Box 85084 Lincoln University Lincoln 7647 Christchurch New Zealand

p +64 3 4230630 | m +64 021 0238 4476 e @.*** | w www.lincoln.ac.nzhttp://www.lincoln.ac.nz/

Lincoln University, Te Whare Wānaka o Aoraki New Zealand's Specialist Land-Based University [cid:167c3081-d879-4de5-8427-b237888bf720] The NZ Plant http://www.mobilekiwi.org/ Transposon http://www.mobilekiwi.org/ Teamhttp://www.mobilekiwi.org/


From: lentil824 @.> Sent: 24 June 2021 5:45 AM To: philres/ngmlr @.> Cc: Winefield, Christopher @.>; Comment @.> Subject: Re: [philres/ngmlr] segmentation fault using ngmlr 0.2.7 (#94)

Hi Chris,

Sorry for the late reply! Unfortunately I couldn't figure this out and didn't get a reply from the developer, so I switched to other mappers (https://github.com/lh3/minimap2https://github.com/lh3/minimap2 and https://github.com/marbl/Winnowmaphttps://github.com/marbl/Winnowmap). Both minimap2 and winnowmap are easy to install and use, so you can give them a try.

Regards, Susan

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/philres/ngmlr/issues/94#issuecomment-867036442, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABYMLSSR2L7YIIZPKWSHAK3TUIMTXANCNFSM4ZD6FAUA.


"The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system."

fritzsedlazeck commented 3 years ago

Sorry guys for the super late reply. Can you describe a bit more about the error that you see and the output that you get from NGMLR running on the command line ? Thanks Fritz

Chriswinefield commented 3 years ago

Hi Fritz,

Not to worry.

I have been working with our national HPC cluster. I have installed ngmlr as the precompiled version into my project space and point the SLUM scheduler to this from my scratch/nobackup space (where the data sits as well).

The SLURM script to launch the analysis is:

!/bin/bash -e

SBATCH --job-name=Nbenth_ngmlr_align

SBATCH --output=AW_%j.out

SBATCH --error=AW_%j.err

SBATCH @.***

SBATCH --mail-type=END

SBATCH --time=02:00:00

SBATCH --mem=80G

SBATCH --partition=bigmem

SBATCH --ntasks=10

SBATCH --profile=task

/nesi/project/lincoln03032/ngmlr/ngmlr/bin/ngmlr-0.2.8/ngmlr -t 20 -r\ /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/NbQld082.genome.fasta -q\ /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/LABallPacBio.fasta -o\ /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/ngmlr_QLD_aligned_with_LAB.sam

the .err file outputs the following after about 12 sec to 3 minutes depending on the run

@. data]$ cat AW_20467022.err ngmlr 0.2.8 (build: Jun 16 2021 22:04:17, start: 2021-06-16.22:27:39) Contact: @., @.*** Opening for output (SAM): QLD_aligned_with_LAB.sam Encoding reference sequence. Size of reference genome 2912 Mbp (max. 68719 Mbp) 0 reference sequences were skipped (length < 10). Writing encoded reference to NbQld082.genome.fasta-enc.2.ngm Writing to disk took 1.34s Building reference index #0 (kmer length: 13, reference skip: 2) 81228 prefixes were ignored due to the frequency cutoff (1000) Overall time for creating RefTable: 386.50s Writing reference index to NbQld082.genome.fasta-ht-13-2.2.ngm Writing to disk took 5.83s Opening query file LABallPacBio.fasta Mapping reads... /var/spool/slurm/job20467022/slurm_script: line 13: 37953 Segmentation fault (core dumped) /nesi/project/lincoln03032/ngmlr/ngmlr/bin/ngmlr-0.2.8/ngmlr -t 4 -r NbQld082.genome.fasta -q LABallPacBio.fasta -o QLD_aligned_with_LAB.sam

So the package loads and appears to start OK then gives a segmentation fault and core dump as the mapping begins.

a couple of thoughts - Should I have installed the ngmlr programme in the same folder as I have the data and where the output will end up? The other thought was I hadn't set the number of threads in SLUM as the same as the -t call for ngmlr - I tried this a day or so ago and this didnt appear to help.

However, I have since found that the PacBio data I was given isnt complete. The fasta file I have to work with is the striped version of the raw PacBio BAM file so only contains read ID and data. One of the files also mixes ONT data into the mix - which was frustrating to find out! I am wondering if this might be causing an error as I expect ngmlr will be looking for read quality metrics associated with the seq data type (PacBio/ONT/illumina).

As an aside would ngmlr accept the raw PacBio BAM file format?

Thanks for the help with tis

Regards Chris

Chris Winefield Associate Professor in Plant Genomics and Molecular Biology

Department of Wine Food and Molecular Biosciences Faculty of Agriculture and Life Sciences

RFH, room 062 Engineering Drive P O Box 85084 Lincoln University Lincoln 7647 Christchurch New Zealand

p +64 3 4230630 | m +64 021 0238 4476 e @.*** | w www.lincoln.ac.nzhttp://www.lincoln.ac.nz/

Lincoln University, Te Whare Wānaka o Aoraki New Zealand's Specialist Land-Based University [cid:4180d74c-65a4-4c64-b17c-6a12f83a1ace] The NZ Plant http://www.mobilekiwi.org/ Transposon http://www.mobilekiwi.org/ Teamhttp://www.mobilekiwi.org/


From: Fritz Sedlazeck @.> Sent: 24 June 2021 9:04 AM To: philres/ngmlr @.> Cc: Winefield, Christopher @.>; Comment @.> Subject: Re: [philres/ngmlr] segmentation fault using ngmlr 0.2.7 (#94)

Sorry guys for the super late reply. Can you describe a bit more about the error that you see and the output that you get from NGMLR running on the command line ? Thanks Fritz

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/philres/ngmlr/issues/94#issuecomment-867158072, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABYMLSQVRSLMJB3YXAIHLP3TUJD5VANCNFSM4ZD6FAUA.


"The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system."

fritzsedlazeck commented 3 years ago

That should not happen like that.. I would suggest that maybe if you can take the test data from this repo and run it with that once . Like e.g. test1 should show you if you get it there or not.

Thx Fritz

Chriswinefield commented 3 years ago

Hi Fritz,

I will grab the data from the repo and give this a whirl and see where we land.

Back to you shortly

Regards Chris

Chris Winefield Associate Professor in Plant Genomics and Molecular Biology

Department of Wine Food and Molecular Biosciences Faculty of Agriculture and Life Sciences

RFH, room 062 Engineering Drive P O Box 85084 Lincoln University Lincoln 7647 Christchurch New Zealand

p +64 3 4230630 | m +64 021 0238 4476 e @.*** | w www.lincoln.ac.nzhttp://www.lincoln.ac.nz/

Lincoln University, Te Whare Wānaka o Aoraki New Zealand's Specialist Land-Based University [cid:3290ca68-f878-43db-ae7a-9813f5f0e9a6] The NZ Plant http://www.mobilekiwi.org/ Transposon http://www.mobilekiwi.org/ Teamhttp://www.mobilekiwi.org/


From: Fritz Sedlazeck @.> Sent: 24 June 2021 9:55 AM To: philres/ngmlr @.> Cc: Winefield, Christopher @.>; Comment @.> Subject: Re: [philres/ngmlr] segmentation fault using ngmlr 0.2.7 (#94)

That should not happen like that.. I would suggest that maybe if you can take the test data from this repo and run it with that once . Like e.g. test1 should show you if you get it there or not.

Thx Fritz

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/philres/ngmlr/issues/94#issuecomment-867186966, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABYMLSRNES7N2T6NDSXVVJTTUJJ5NANCNFSM4ZD6FAUA.


"The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system."

Chriswinefield commented 3 years ago

Hi Fritz,

With the test_1 data it appears to work fine.

ngmlr 0.2.8 (build: Jun 16 2021 22:04:17, start: 2021-06-23.22:19:56) Contact: @., @. Opening for output (SAM): /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/test1_ngmlr.sam Encoding reference sequence. Size of reference genome 0 Mbp (max. 68719 Mbp) 0 reference sequences were skipped (length < 10). Writing encoded reference to /nesi/project/lincoln03032/ngmlr/ngmlr/test/data/test_1/ref_chr6_140kb.fa-enc.2.ngm Writing to disk took 0.10s Building reference index #0 (kmer length: 13, reference skip: 2) 0 prefixes were ignored due to the frequency cutoff (1000) Overall time for creating RefTable: 2.88s Writing reference index to /nesi/project/lincoln03032/ngmlr/ngmlr/test/data/test_1/ref_chr6_140kb.fa-ht-13-2.2.ngm Writing to disk took 0.21s Opening query file /nesi/project/lincoln03032/ngmlr/ngmlr/test/data/test_1/long_name.fa Mapping reads... Processed: 3 (1.00), R/S: 1.50, RL: 3933, Time: 3.00 7.00 34.50, Align: 1.00, 309, 1.00 Done (3 reads mapped (100.00%), 0 reads not mapped, 3 lines written)(elapsed: 0m, 0 r/s)

So I am wondering if the fault I am having is a result of the format of the read/seq files? I had a look at your test data and the data I have been trying to map and there is quite a difference (no suprises there as the data I have been working with is essentially an extracted fasta from the originating BAM)

My data:

ba0503ce-f9cb-41aa-8a75-ff8942291e67 runid=e8bded7f079909caf709756860f919e80edb7f0d sampleid=L1 read=127 ch=2174 start_time=2018-07-02T22:49:03Z ATGATGTGCGCTTCGTTCAGTTACACCATCAGATTGTGTTAGTCTTTTTTTTTGGAATTTTTGAATTTTTGCCAACCTCTGCCGTTTGCCGTGCATATCGGTCACGAACAGTCAATTGCAAACTGGTAACCTGGATTTGTTCTATCAGTAATCGACCTTGTCCCTAATTAAATCGAATAAATCCTTA

your test data:

m140612_020550_42156_c100652082550000001823118110071460_s1_p0/49/0_11124>m140612_020550_42156_c100652082550000001823118110071460_s1_p0/66/1019_13505>m140612_020550_42156_c100652082550000001823118110071460_s1_p0/66/13545_26663>m140612_020550_42156_c100652082550000001823118110071460_s1_p0/66/26699_39808>m140612_020550_42156_c100652082550000001823118110071460_s1_p0/157671/0_4333>m140612_020550_42156_c100652082550000001823118110071460_s1_p0/157674/0_6303>m140612_020550_42156_c100652082550000001823118110071460_s1_p0/157676/0_14740>m140612_020550_42156_c100652082550000001823118110071460_s1_p0/157677/3585_9506>m140612_020550_42156_c100652082550000001823118110071460_s1_p0/157664/5106_6916>m140612_020550_42156_c100652082550000001823118110071460_s1_p0/157668/1089_6689 AAAATATCAGTAAAATCAGATTGACCTCAAACCCTGTATCTTTCAAAAAAGACATAATTTTGTTTTTCAAGCA

I will get the chaps in Australia to provide the raw files and I will try again.

Regards Chris

Chris Winefield Associate Professor in Plant Genomics and Molecular Biology

Department of Wine Food and Molecular Biosciences Faculty of Agriculture and Life Sciences

RFH, room 062 Engineering Drive P O Box 85084 Lincoln University Lincoln 7647 Christchurch New Zealand

p +64 3 4230630 | m +64 021 0238 4476 e @.*** | w www.lincoln.ac.nzhttp://www.lincoln.ac.nz/

Lincoln University, Te Whare Wānaka o Aoraki New Zealand's Specialist Land-Based University [cid:eadb1b58-07a6-4ffc-9484-1119a3d1ba83] The NZ Plant http://www.mobilekiwi.org/ Transposon http://www.mobilekiwi.org/ Teamhttp://www.mobilekiwi.org/


From: Fritz Sedlazeck @.> Sent: 24 June 2021 9:55 AM To: philres/ngmlr @.> Cc: Winefield, Christopher @.>; Comment @.> Subject: Re: [philres/ngmlr] segmentation fault using ngmlr 0.2.7 (#94)

That should not happen like that.. I would suggest that maybe if you can take the test data from this repo and run it with that once . Like e.g. test1 should show you if you get it there or not.

Thx Fritz

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/philres/ngmlr/issues/94#issuecomment-867186966, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABYMLSRNES7N2T6NDSXVVJTTUJJ5NANCNFSM4ZD6FAUA.


"The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system."

fritzsedlazeck commented 3 years ago

Thanks . Feel free to keep me posted. Fritz

lentil824 commented 3 years ago

Hi Fritz,

Thanks for the suggestion! The same error "segmentation fault" appeared when using test_1 data and precompiled ngmlr 0.2.7. I noticed Chris was using ngmlr 0.2.8 build, so decided to build ngmlr from source. Surprisingly, ngmlr 0.2.8 build from source worked well with test_1 data and my own data (still running so far with my large data file). So, now I think there might be some issues with the precompiled version, but I will leave this to you.

Regards, Susan

lentil824 commented 3 years ago

Btw, Chris, I have ONT data in fasta. So it is pretty straightforward by adding -x ont. Your data is a bit complicated from what you described. With the help from Fritz, I am sure you will get it sorted out.

Regards, Susan

fritzsedlazeck commented 3 years ago

Thanks Susan! Fritz

Chriswinefield commented 3 years ago

Hi Fritz,

Sorry for the delay in getting back to you - it has taken a while to get the raw data files from my collaborators.

I have tried a few different things, including moving the instal site to the same partition as the raw data. I have just tried mapping the fasta packaged with the PacBio tarball I received.

The run fails again with a segmentation fault but this time generated some mapped reads before reporting a invalid pointer and mem corruption

the output from the .err file is as follows: ngmlr 0.2.8 (build: Jul 1 2021 04:45:58, start: 2021-07-07.03:52:21) Contact: @., @. Opening for output (SAM): /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/ngmlr_LAB_aligned_with_QLD_data.sam Encoding reference sequence. Size of reference genome 2836 Mbp (max. 68719 Mbp) 0 reference sequences were skipped (length < 10). Writing encoded reference to /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/NbLab330.genome.fasta-enc.2.ngm Writing to disk took 0.46s Building reference index #0 (kmer length: 13, reference skip: 2) 82438 prefixes were ignored due to the frequency cutoff (1000) Overall time for creating RefTable: 324.03s Writing reference index to /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/NbLab330.genome.fasta-ht-13-2.2.ngm Writing to disk took 1.19s Opening query file /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/QLD_Benth_1/m54105_190325_053810.fasta Mapping reads... Processed: 7 (0.57), R/S: 3.50, RL: 681342, Time: 0.00 0.00 0.00, Align: 1.00, 1211, 0.90 Error in `ngmlr': free(): invalid pointer: 0x00002aad2c0bb6f0 Error in `ngmlr': malloc(): memory corruption: 0x00002aad2c0bb730 /var/spool/slurm/job20732151/slurm_script: line 13: 15624 Segmentation fault (core dumped) /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/ngmlr/bin/ngmlr-0.2.8/ngmlr -t 20 -r /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/NbLab330.genome.fasta -q /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/QLD_Benth_1/m54105_190325_053810.fasta -o /nesi/nobackup/lincoln03032/TE_analysis/nbenth_analysis/data/ngmlr_LAB_aligned_with_QLD_data.sam

Any thoughts of where to go from here would be fantastic.

Kind Regards Chris

Chris Winefield Associate Professor in Plant Genomics and Molecular Biology

Department of Wine Food and Molecular Biosciences Faculty of Agriculture and Life Sciences

RFH, room 062 Engineering Drive P O Box 85084 Lincoln University Lincoln 7647 Christchurch New Zealand

p +64 3 4230630 | m +64 021 0238 4476 e @.*** | w www.lincoln.ac.nzhttp://www.lincoln.ac.nz/

Lincoln University, Te Whare Wānaka o Aoraki New Zealand's Specialist Land-Based University [cid:5cee39f5-5cf1-4045-b73b-bf66bd7a24f3] The NZ Plant http://www.mobilekiwi.org/ Transposon http://www.mobilekiwi.org/ Teamhttp://www.mobilekiwi.org/


From: Fritz Sedlazeck @.> Sent: 24 June 2021 10:34 AM To: philres/ngmlr @.> Cc: Winefield, Christopher @.>; Comment @.> Subject: Re: [philres/ngmlr] segmentation fault using ngmlr 0.2.7 (#94)

Thanks . Feel free to keep me posted. Fritz

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/philres/ngmlr/issues/94#issuecomment-867203675, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABYMLSWESDWVIJ3YXRI3A5LTUJOPXANCNFSM4ZD6FAUA.


"The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system."