Closed lh3 closed 2 years ago
Sorry but I cannot get access to the FTP site you noted. It is most convenient for me if you send me in any means the particular amino acid sequence(s) that leads to the segmentation fault.
Osamu,
I am using reference genome from this link:
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
The protein sequence that causes the problem is:
>ENSMUSP00000074009.7
MWRADRWAPLLLFLLQSALGRPRLAPPRNVTLFSQNFTVYLTWLPGLGSPPNVTYFVTYQSYIKTGWRPVEHCAGIKALV
CPLMCLKKLNLYSKFKGRVQAASAHGRSPRVESRYLEYLFDVELAPPTLVLTQMEKILRVNATYQLPPCMPSLELKYQVE
FWKEGLGSKTLFPDTPYGQPVQIPLQQGASRRHCLSARTVYTLIDIKYSQFSEPSCIFLEAPGDKRAVLAMPSLLLLLIA
AVAAGVAWKIMKGNPWFQGVKTPRALDFSEYRYPVATFQPSGPEFSDDLILCPQKELTIRNRPAPQVRNPATLQAGPERD
STEDEDEDTDYDDDGDSVQPYLERPLFISEKPRVMEHSETDESGVDSGGPWTSPVGSDGSSAWDSSDRSWSSTGDSSYKD
EVGSSSCLDRKEPDQAPCGDWLQEALPCLEFSEDLGTVEEPLKDGLSGWRISGSLSSKRDLAPVEPPVSLQTLTFCWVNN
PEGEEEQEDEEEEEEEEEEEDWESEPKGSNAGCWGTSSVQRTEVRGRMLGDYLVR
I get a crash on my end with
src/spaln -Q7 -t1 -O0 -Thomosapi -dhs38.bkp protein.fa
I don't see the crash without -T
.
Dear Heng,
Thank you very much for sending me the example that caused segmentation fault. I also reproduced that problem on my site. I will try to fix the problem as soon as possible.
By the way, no segmentation fault occurred with this sequence by using -yX2 option tuned for remote homologs. The result showed that spaln failed to correctly map the query on the genome, so that the obtained alignment was meaningless. Probably, the incorrect mapping is due to non-specific matches of repetitive elements. This hints a direction to which spaln should be improved. Namely, spaln should accept soft-masked sequences, where masked ones are used in the mapping phase whereas unmasked ones are used in the alignment phase. I want to incorporate this feature in a future release.
Osamu,
Thank you for maintaining spaln these years. It is a great tool. I dig a little further about this protein ENSMUSP00000074009.7
. It is Ifnlr1
according to the Ensembl annotation. Here is another possible alignment of this protein on GRCh38:
chr1 miniprot mRNA 24157130 24187248 1574 - . ID=MP000001;Identity=0.5933;Positive=0.7015;Target=ENSMUSP00000074009.7 1 535
chr1 miniprot CDS 24187191 24187248 0 - 0 Parent=MP000001;Target=ENSMUSP00000074009.7 1 19
chr1 miniprot CDS 24180731 24180854 0 - 2 Parent=MP000001;Target=ENSMUSP00000074009.7 20 60
chr1 miniprot CDS 24169417 24169598 0 - 1 Parent=MP000001;Target=ENSMUSP00000074009.7 61 121
chr1 miniprot CDS 24161542 24161684 0 - 2 Parent=MP000001;Target=ENSMUSP00000074009.7 122 169
chr1 miniprot CDS 24159474 24159633 0 - 0 Parent=MP000001;Target=ENSMUSP00000074009.7 170 222
chr1 miniprot CDS 24159052 24159182 0 - 2 Parent=MP000001;Target=ENSMUSP00000074009.7 223 266
chr1 miniprot CDS 24157133 24157891 0 - 0 Parent=MP000001;Target=ENSMUSP00000074009.7 267 535
chr1 miniprot stop_codon 24157130 24157132 0 - 0 Parent=MP000001
The identity is below 60%, which is uncommon for human-mouse orthologs. Nonetheless, this region contains the human IFNLR1 gene. The above might be a real alignment (EDIT: I see a non-canonical splicing. Not sure if that is correct).
Dear Heng,
I have just uploaded the new version spaln2.4.13. I replaced several files with corresponding ones under revision. The new version runs without a trouble with the sequence you provided, producing seemingly correct alignment. In this regard, I was wrong about the anticipation that the mapping phase might be problematic. I also tried to map/align all mouse refseq protein sequences of NP_* (excluding predicted ones) onto human genome. Although I do not yet evaluate the results, at least I met no clash during the execution.
I would like thank you again for your precious comments, and am very glad to have the opportunity of communicating with you.
Osamu,
Dear Osamu,
Thank you for the fast response. I got another crash and was trying to identify the sequence. I think the following zebrafish protein is the culprit:
>ENSDARP00000108510.1
MIYLILLSAFSGAVVTLLLQLLLLYRRSPEPVARTVQYVKVVPDPALKDYFSSQQADSAPQQPDSPSPVSKQPEAASPKQ
QETPVPGSSPKQQPSSPPPPSLGDPQHSSKAETCDFLNAIILFLFRELRDTPVVRHWITKKIKVEFEELLQTKTAGRLLE
GLSLRDVSLGNSVPVFKTARLMKPVAVNEDNMPEELNFEVDLEYNGGFHLAIDVDLVFGKSAYLFVKMTRVAGRLKLQFT
RMPFTHWSFSFLEDPLIDFEVKSQFEGRPLPQLTSIIVNQLKRVIKRKHTLPNYKIRYKPFFPFQVQPPLMSSCDLDISI
RDTLLVEGRLRVTLVECSRLFILGSYERETYVHCTFELSSDEWREKTRSSIKETEVIKGPSGSVGMTFRHVPASDGDTVH
VSIETVTPNSPAALADLQRGDRLIAIGGVKVTSSVQVPKLLKQAGERVIVLYERPVRHHVPTGGLGMLQETLGPMEEPSY
LPQPGGYEEDPAPITTMDISENKDNDSEFEELNVESKTAAAPVTIDTKEDFLLSVNQSPKKTVANLAKPLGSISPILNRR
LNLQSPLKTQPKESPKPPTLKNAEPSEQPQRPTVPPPPPPARPPVPPRPHIKVTSASSEAQSLVEGNEPTVEKSPEKTQP
NTGNGEKTVEKIPVKPPEPKPVSKHPEPTEDILNIPATNKQDSAKDKISESSSNTRDSVDEQGLWESSETMYRNRTARWN
KASVIFEVESNHKFLNVALWCKNPFKLGSLLCLGHVSLRLEHLALECISTSSAEYQSTFRLCAPEPRASVSRTALRSLST
HKGFNEKLCYGDVTLNFTYLADGESDLSSGLTERERKGSLQEEDLKDREKEREQVLMVTRDEPIYSGMQIGEMRHNFQDT
QFQNPTYCEYCKKKVWTKAASQCMICSYVCHKKCQEKCLLEHPYCVAASDRRGADPEAKSTINRATTGLTRHIINTSSRL
LNLRQVPKARLAEQVADMGSGVVEPSPKHTPNTSDNESSDTETYTGASPSKQPAGSSGSKLVRKEGGLDDSVFIAVKEIG
RDLYRGLPTDERSQKLELMLDKLQQEIDQELEHNNSLSTEERDTIDSRRKTLITAALAKSGERLQALTLLMIHYRAGIED
LESVESTSPSEQHGFPKAKSEGLEEALMGTEVYDSDMCSPVDVQMLDEITEEQICVEALP
At my hand, if I run spaln on this sequence alone, I will not see a crash. However, if I mix it with a few other sequences, I will see a segmentation fault. addressSanitizer reported the following (I modified Makefile by adding -fsanitize=address -g
and removing -O3
):
hli@scorpius spaln-1$ src/spaln -Q7 -t1 -O0 -Thomosapi -dhs38.bkp 0-5.fa
=================================================================
==2737==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61700000005c at pc 0x000000585a4d bp 0x7fe94ab91350 sp 0x7fe94ab91348
WRITE of size 4 at 0x61700000005c thread T2
#0 0x585a4c in Aln2h1::initH_ng(RVPD**, WINDOW const&, RANGE const*) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:140
#1 0x58a80b in Aln2h1::forwardH_ng(int*, WINDOW const&, bool, RANGE const*) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:305
#2 0x5ae4ec in Aln2h1::trcbkalignH_ng(WINDOW const&, bool, RANGE const*) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:1964
#3 0x5af578 in Aln2h1::lspH_ng(WINDOW const&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2034
#4 0x5affd0 in Aln2h1::lspH_ng(WINDOW const&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2087
#5 0x5c3611 in Aln2h1::interpolateH(unsigned int, int, JUXT const*, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2973
#6 0x5c672f in Aln2h1::seededH_ng(unsigned int, int, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3118
#7 0x5c26fb in Aln2h1::interpolateH(unsigned int, int, JUXT const*, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2947
#8 0x5c672f in Aln2h1::seededH_ng(unsigned int, int, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3118
#9 0x5c26fb in Aln2h1::interpolateH(unsigned int, int, JUXT const*, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2947
#10 0x5c5edf in Aln2h1::seededH_ng(unsigned int, int, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3096
#11 0x5c7035 in Aln2h1::globalH_ng(int*, WINDOW const&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3139
#12 0x5c7739 in alignH_ng(Seq const**, PwdB const*, int*) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3166
#13 0x4cd0da in spalign2 /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:647
#14 0x4d00d3 in blkaln /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:829
#15 0x4d3c27 in quick4 /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1031
#16 0x4d48d9 in spaln_job /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1072
#17 0x4d7064 in worker_func /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1289
#18 0x7fea2a8c1ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)
#19 0x7fea29eceb0c in clone (/lib64/libc.so.6+0xfeb0c)
0x61700000005c is located 36 bytes to the left of 672-byte region [0x617000000080,0x617000000320)
allocated by thread T2 here:
#0 0x4a465f in malloc (/hlilab/hli/miniprot/spaln-1/src/spaln+0x4a465f)
#1 0x6a9e64 in operator new(unsigned long) (/hlilab/hli/miniprot/spaln-1/src/spaln+0x6a9e64)
#2 0x5ae4ec in Aln2h1::trcbkalignH_ng(WINDOW const&, bool, RANGE const*) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:1964
#3 0x5af578 in Aln2h1::lspH_ng(WINDOW const&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2034
#4 0x5affd0 in Aln2h1::lspH_ng(WINDOW const&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2087
#5 0x5c3611 in Aln2h1::interpolateH(unsigned int, int, JUXT const*, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2973
#6 0x5c672f in Aln2h1::seededH_ng(unsigned int, int, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3118
#7 0x5c26fb in Aln2h1::interpolateH(unsigned int, int, JUXT const*, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2947
#8 0x5c672f in Aln2h1::seededH_ng(unsigned int, int, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3118
#9 0x5c26fb in Aln2h1::interpolateH(unsigned int, int, JUXT const*, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2947
#10 0x5c5edf in Aln2h1::seededH_ng(unsigned int, int, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3096
#11 0x5c7035 in Aln2h1::globalH_ng(int*, WINDOW const&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3139
#12 0x5c7739 in alignH_ng(Seq const**, PwdB const*, int*) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3166
#13 0x4cd0da in spalign2 /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:647
#14 0x4d00d3 in blkaln /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:829
#15 0x4d3c27 in quick4 /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1031
#16 0x4d48d9 in spaln_job /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1072
#17 0x4d7064 in worker_func /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1289
#18 0x7fea2a8c1ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)
Thread T2 created by T0 here:
#0 0x450322 in pthread_create (/hlilab/hli/miniprot/spaln-1/src/spaln+0x450322)
#1 0x4d820b in MasterWorker /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1341
#2 0x4da18f in main /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1492
#3 0x7fea29df2554 in __libc_start_main (/lib64/libc.so.6+0x22554)
Hope this helps.
Dear Heng,
As you suggested, ENSDARP00000108510.1 was actually responsible for the crash. Thank you again for your help. Additionally, I found another bug which can cause a similar trouble. I have just uploaded a revised version of spaln (spaln2.4.13a). Using all Ensembl zebrafish proteins as queries against human genome, this version, with or without -yX2 option, successfully finished. Although this does not guaranty the legitimacy of this implementation, the robustness of the program has certainly been improved. I welcome your further comments and suggestions.
Osamu,
Dear Osamu,
Thanks a lot for the update. Spaln is indeed more robust. It now gives alignment on several settings that previously crashed. However, I found a new case that causes a segmentation fault:
>ENSDARP00000141901.2
TEKLLLKRLSSTIIKMAFIKEETEDLKIEQVFTLKREDHEEQTDLTLLKEEIQELNDVKEEEDPKAQNTPQKHFKRHYCG
RGFTEKRNLTVHSRVHTGETRFSCKKCGESFNKKDLFEKHKEIHLAVICRHCGRQFTQKYIKTHMRIHTGERPFRCGQCG
KSFAQRSTLDTHVITHTGERPYACSHCGNGFTTKASLDCHMRIHTGEKPFTCEQCGKSFSEKGSLTIHMRFHTGERPFVC
YQCGKGFVIKGNLDRHMIVHSGEKPYSCPQCGKGFKHKARIGVHMMIHSGEKPFACDQCGKSFSTKVHLESHKRVHLKDN
RVKCHQCGMSFPDGSQLKDHVQTHIGQKPFMCPECGRSCSKKPSLKIHMRSHAAEKPFTCKQCGKSYCVRGVLNVHMRIH
TGEKPYTCKQCGKSFLYQSDLKRHSKTHSGQED
The command line and the addressSanitizer report can be found in the following (adding -yX2
leads to the same crash):
$ ../spaln-1/src/spaln -Q7 -t1 -O0 -Thomosapi -dhs38.bkp -LS this-protein.fa
=================================================================
==196113==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x2aab82fd0abe at pc 0x00000058bedf bp 0x2aab8a71c420 sp 0x2aab8a71c418
READ of size 2 at 0x2aab82fd0abe thread T2
#0 0x58bede in Aln2h1::forwardH_ng(int*, WINDOW const&, bool, RANGE const*) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:349
#1 0x5ae54a in Aln2h1::trcbkalignH_ng(WINDOW const&, bool, RANGE const*) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:1962
#2 0x5af5d6 in Aln2h1::lspH_ng(WINDOW const&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2034
#3 0x5b002e in Aln2h1::lspH_ng(WINDOW const&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2088
#4 0x5c366f in Aln2h1::interpolateH(unsigned int, int, JUXT const*, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2974
#5 0x5c678d in Aln2h1::seededH_ng(unsigned int, int, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3119
#6 0x5c2759 in Aln2h1::interpolateH(unsigned int, int, JUXT const*, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2948
#7 0x5c678d in Aln2h1::seededH_ng(unsigned int, int, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3119
#8 0x5c2759 in Aln2h1::interpolateH(unsigned int, int, JUXT const*, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:2948
#9 0x5c5f3d in Aln2h1::seededH_ng(unsigned int, int, BOUND&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3097
#10 0x5c7093 in Aln2h1::globalH_ng(int*, WINDOW const&) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3140
#11 0x5c7797 in alignH_ng(Seq const**, PwdB const*, int*) /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:3167
#12 0x4ccf74 in spalign2 /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:641
#13 0x4cff6d in blkaln /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:823
#14 0x4d3ac1 in quick4 /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1025
#15 0x4d4773 in spaln_job /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1066
#16 0x4d6efe in worker_func /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1283
#17 0x2aaaaacd6ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)
#18 0x2aaaab7058dc in __clone (/lib64/libc.so.6+0xfe8dc)
0x2aab82fd0abe is located 3394 bytes to the left of 8464442-byte region [0x2aab82fd1800,0x2aab837e403a)
allocated by thread T2 here:
#0 0x4a465f in malloc (/hlilab/hli/miniprot/spaln-1/src/spaln+0x4a465f)
#1 0x6a9ec4 in operator new(unsigned long) (/hlilab/hli/miniprot/spaln-1/src/spaln+0x6a9ec4)
#2 0x4d405b in genomicseq /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1049
#3 0x4ccf3e in spalign2 /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:638
#4 0x4cff6d in blkaln /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:823
#5 0x4d3ac1 in quick4 /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1025
#6 0x4d4773 in spaln_job /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1066
#7 0x4d6efe in worker_func /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1283
#8 0x2aaaaacd6ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)
Thread T2 created by T0 here:
#0 0x450322 in pthread_create (/hlilab/hli/miniprot/spaln-1/src/spaln+0x450322)
#1 0x4d80a5 in MasterWorker /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1335
#2 0x4da029 in main /homes6/hli/hli1/miniprot/spaln-1/src/spaln.cc:1486
#3 0x2aaaab629554 in __libc_start_main (/lib64/libc.so.6+0x22554)
SUMMARY: AddressSanitizer: heap-buffer-overflow /homes6/hli/hli1/miniprot/spaln-1/src/fwd2h1.cc:349 in Aln2h1::forwardH_ng(int*, WINDOW const&, bool, RANGE const*)
Thank you,
Heng
Dear Heng,
The reason for the crash was due to an easy mistake not to copy the start site of the query in the local alignment. I also fixed several errors of the Hirschberg method for DNA queries. The fixed version is uploaded as spaln.2.4.13b.
Osamu,
Dear Osamu,
On my end, it seems that v2.4.13b still segfaults on the same input sequence (ENSDARP00000141901.2
in my last post):
hli@node01 spaln$ ../spaln-new/src/spaln -Q7 -t1 -O0 -dhs38.bkp -LS 4-3.fa
Segmentation fault (core dumped)
Could you help to check if this happens on your machine?
Thanks,
Heng
Dear Heng,
Although spaln worked normally with the human genomic sequence of my custom use, it did fail with GCA_000001405.15_GRCh38_no_alt_analysis_set.fna. I identified that the segmentation fault was due to incomplete initialization of alignment variables. After correction, it produced following output, in which the fourth alignment was problematic.
$ spaln -Q7 -d homosa38_g -T homosapi -O0 -LS -M ENSDARP00000141901.2
chr19 ALN gene 44100807 44157711 980 + . ID=gene00001;Name=chr19_44129 chr19 ALN mRNA 44100807 44157711 980 + . ID=mRNA00001;Parent=gene00001;Name=chr19_44129 chr19 ALN cds 44100807 44100946 107 + 0 ID=cds00001;Parent=mRNA00001;Name=chr19_44129;Target=ENSDARP00000141901.2 9 55 + chr19 ALN cds 44106802 44107252 327 + 1 ID=cds00002;Parent=mRNA00001;Name=chr19_44129;Target=ENSDARP00000141901.2 56 201 + chr19 ALN cds 44131284 44131358 115 + 0 ID=cds00003;Parent=mRNA00001;Name=chr19_44129;Target=ENSDARP00000141901.2 202 226 + chr19 ALN cds 44157097 44157711 600 + 0 ID=cds00004;Parent=mRNA00001;Name=chr19_44129;Target=ENSDARP00000141901.2 227 431 +
chr19 ALN gene 43719342 44107267 915 + . ID=gene00002;Name=chr19_43913 chr19 ALN mRNA 43719342 44107267 915 + . ID=mRNA00002;Parent=gene00002;Name=chr19_43913 chr19 ALN cds 43719342 43719512 86 + 0 ID=cds00005;Parent=mRNA00002;Name=chr19_43913;Target=ENSDARP00000141901.2 1 45 + chr19 ALN cds 43719717 43719794 56 + 0 ID=cds00006;Parent=mRNA00002;Name=chr19_43913;Target=ENSDARP00000141901.2 46 67 + chr19 ALN cds 43996527 43996946 329 + 0 ID=cds00007;Parent=mRNA00002;Name=chr19_43913;Target=ENSDARP00000141901.2 68 201 + chr19 ALN cds 44086145 44086399 280 + 0 ID=cds00008;Parent=mRNA00002;Name=chr19_43913;Target=ENSDARP00000141901.2 202 286 + chr19 ALN cds 44106830 44107267 405 + 0 ID=cds00009;Parent=mRNA00002;Name=chr19_43913;Target=ENSDARP00000141901.2 287 431 +
chr19 ALN gene 56622134 58067960 953 + . ID=gene00003;Name=chr19_57345 chr19 ALN mRNA 56622134 58067960 953 + . ID=mRNA00003;Parent=gene00003;Name=chr19_57345 chr19 ALN cds 56622134 56622276 94 + 0 ID=cds00010;Parent=mRNA00003;Name=chr19_57345;Target=ENSDARP00000141901.2 1 49 + chr19 ALN cds 56622358 56622603 168 + 1 ID=cds00011;Parent=mRNA00003;Name=chr19_57345;Target=ENSDARP00000141901.2 50 130 + chr19 ALN cds 57420911 57421083 191 + 1 ID=cds00012;Parent=mRNA00003;Name=chr19_57345;Target=ENSDARP00000141901.2 131 185 + chr19 ALN cds 58067197 58067960 709 + 2 ID=cds00013;Parent=mRNA00003;Name=chr19_57345;Target=ENSDARP00000141901.2 186 432 +
chr19 ALN gene 12014935 12319160 716 - . ID=gene00004;Name=chr19_12167 chr19 ALN mRNA 12014935 12319160 716 - . ID=mRNA00004;Parent=gene00004;Name=chr19_12167 chr19 ALN cds 12318669 12319160 364 - 0 ID=cds00014;Parent=mRNA00004;Name=chr19_12167;Target=ENSDARP00000141901.2 82 241 + chr19 ALN cds 12273300 12273822 455 - 0 ID=cds00015;Parent=mRNA00004;Name=chr19_12167;Target=ENSDARP00000141901.2 242 415 + chr19 ALN cds 12014935 12014990 37 - 2 ID=cds00016;Parent=mRNA00004;Name=chr19_12167;Target=ENSDARP00000141901.2 416 432 +
The corrected version has been uploaded as spaln.2.4.13c.
Osamu,
Dear Osamu,
Thank you so much for the fix. v2.4.13c now can align all zebrafish proteins without segfault on my end as well. I will close this issue.
Heng
I am using version 2.4.12, checked out on August 14, 2022. I compiled from source code on CentOS 7. Here are the command lines:
There are about ~20k "canonical" mouse proteins in
mm39.canon.fa
. All three crashes happened after aligning ~10k proteins. The system error message looks like:I have put the input sequences at:
I am not sure if the crash can be reproduced on your end, though.