ogotoh / spaln

Genome mapping and spliced alignment of cDNA or amino acid sequences
GNU General Public License v2.0
94 stars 16 forks source link

Segmentation fault (core dumped) #23

Open binlu1981 opened 5 years ago

binlu1981 commented 5 years ago

I used spaln to annotated a gene structure in de-nove assembly using a homologous protein, but spaln report error (Segmentation fault (core dumped)). The command as follow spaln -Q3 -O0 -KP -C1 -M100 genome.fasta protein.fa

I installed spaln through bioconda. How to resolve? Thanks

ogotoh commented 5 years ago

Spaln can be run in several different modes, but your command does not fit any of them.

I guess that ‘genome.fasta’ represents nearly the entire genomic sequence. If so, you must first format genome.fasta in the ‘seqdb’ directory. 1) Change the extension from fasta to mfa or gf: % mv genome.fasta genome.mfa (or genome.gf) 2) Format genome.mfa: % makeidx.pl –ip genome.mfa 3) Then run spaln: % spaln –Q7 –d genome –O0 [other options] protein.fa

Note that the argument to the –M option indicates the maximal number of outputs (e.g, paralogs) per each query. Unless you expect that your genome possesses a large number of paralogs closely homologous to protein.fa and you want to retrieve most of them, the argument should be a small number (typically < 5).

I also advise to use –T option to specify the most suitable parameter set for your genome. You may find it by looking into talbe/gnmtab.

binlu1981 commented 5 years ago

Thanks! I can not find makeidx.pl in spaln installed from conda. Should I re-download from github and compile it? If kindly integrate into spaln should be good for convenience.

ogotoh commented 5 years ago

Dear Bin,

I do not know the details of Conda. Probably, you may get the latest and complete version of spaln from GitHub or http://www.genome.ist.i.kyoto-u.ac.jp/~aln_user/spaln/.

Osamu,


差出人: Bin Lu notifications@github.com 送信日時: 2019年10月16日 0:46 宛先: ogotoh/spaln spaln@noreply.github.com CC: 後藤修 o.gotoh@aist.go.jp; Comment comment@noreply.github.com 件名: Re: [ogotoh/spaln] Segmentation fault (core dumped) (#23)

Thanks! I can not find makeidx.pl in spaln installed from conda. Should I re-download from github and compile it? If kindly integrate into spaln should be good for convenience.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ogotoh/spaln/issues/23?email_source=notifications&email_token=AH6C4LQ5RH7MAFNK7IEPQKTQOXQWVA5CNFSM4I3VMU5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJIKEQ#issuecomment-542278930, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH6C4LVYB5INCQWAYIOYKHLQOXQWVANCNFSM4I3VMU5A.

binlu1981 commented 5 years ago

I installed spaln from github and worked well. If I want to obtained different format output (like gff, and exon sequences), can I used -O option mutiple times?

ogotoh commented 5 years ago

Yes, the current implementation does not produce multiple formats at a time.

I have received a similar question and request from others. Hence, I am now modifying Spaln to generate multiple format outputs. However, I have now not enough time to devote myself to this work. I expect that the work will finish in a few weeks.

Osamu,


差出人: Bin Lu notifications@github.com 送信日時: 2019年10月16日 15:14 宛先: ogotoh/spaln spaln@noreply.github.com CC: 後藤修 o.gotoh@aist.go.jp; Comment comment@noreply.github.com 件名: Re: [ogotoh/spaln] Segmentation fault (core dumped) (#23)

I installed spaln from github and worked well. If I want to obtained different format output (like gff, and exon sequences), can I used -O option mutiple times?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ogotoh/spaln/issues/23?email_source=notifications&email_token=AH6C4LX4ONBG5YM4MOCGIDLQO2WMZA5CNFSM4I3VMU5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBLHLVQ#issuecomment-542537174, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH6C4LSBTGHKARYSPSDWNJ3QO2WMZANCNFSM4I3VMU5A.

ogotoh commented 5 years ago

Please show your complete command. The human gene corresponding to OPSB_HUMAN has only 5 exons, while your output indicates your gene is segmented into 14 exons. Excessively fragmented gene structure is a sign of improper use of species specific parameter set. Confirm that you are using right -T option.

Show-alignment option (-O1) is sometimes helpful to see what happens.

When the first exon is shorter than 3 (codon), unexpected output can be produced with -O0 (gff) or -O3 (bed) options (Issues #1 and #17). You may try -O6 option to directly yield CDS without intermediate files.

Osamu,


差出人: Bin Lu notifications@github.com 送信日時: 2019年10月16日 23:58 宛先: ogotoh/spaln spaln@noreply.github.com CC: 後藤修 o.gotoh@aist.go.jp; Comment comment@noreply.github.com 件名: Re: [ogotoh/spaln] Segmentation fault (core dumped) (#23)

I used -Q6 to get the bed12 file and then to extract cds region using bedtools. But the bedtools report errors "#Error: cannot construct subsequence with negative offset or length < 1", which indicated that start < 0 or exon length < 1. I checked output bed file and found that col11 had 0 size for some lines (0,3,3,3 ...).

chr2 204358417 204432985 OPSB_HUMAN 1000 - 204358417 204432985 0,255,255 14 0,3,3,3,108,240,166,169,330,3,4,2,18,10, 0,119,221,312,5515,8682,11719,13733,16332,72550,72744,72838,73989,74558

Why there are zero exon size? The default min size of exon is 2. Any suggestions? Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ogotoh/spaln/issues/23?email_source=notifications&email_token=AH6C4LUHCYFVLZKCIUROPCTQO4TZLA5CNFSM4I3VMU5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMZYMY#issuecomment-542743603, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH6C4LTWACH3YQ3ET3B5P3DQO4TZLANCNFSM4I3VMU5A.

binlu1981 commented 5 years ago

Can I set the min length of exon =3 in spaln to avoid this case? You are right, I set the wrong option. Now I can get the right result. Thanks for your reply!

ogotoh commented 5 years ago

See my comment in Issue #19.

binlu1981 commented 5 years ago

Dear Osamu, I found some sequences header contain "M Deleted 1 chars at 1952597", what that mean? Is it a frameshift? But I can not found stop codon in the cds and pep sequences. How can I get the frameshift sequences? Thanks

ogotoh commented 5 years ago

Yes, each ;M line indicates presence of a frameshift. Upon translation, 1 or 2 nucleotide(s) inserted in the genomic sequence is simply deleted. An incomplete codon harboring a single-nucleotide deletion is translated according to the tron code of its second position, whereas that harboring a two-nucleotide deletion is not translated. Although more sophisticated alignment algorithms might be applied, Spaln prefers speed rather than a small improvement in accuracy of rare events.

Currently, the output with the –O7 option shows only the genomic coordinate of frameshifts, whereas the locations of the affected amino acids are not shown. This may be modified in a future revision. Please use –O1 option to identify the location of frameshifts and the way of translation.

The output of the –O6 option is a concatenation of predicted coding exons that are copies of the genomic sequence. This means that the nucleotides that cause frameshifts are also copied and no nucleotide is added to compensate for a deletion frameshift.

Osamu,


差出人: Bin Lu notifications@github.com 送信日時: 2019年10月24日 0:16 宛先: ogotoh/spaln spaln@noreply.github.com CC: 後藤修 o.gotoh@aist.go.jp; Comment comment@noreply.github.com 件名: Re: [ogotoh/spaln] Segmentation fault (core dumped) (#23)

Dear Osamu, I found some sequences header contain "M Deleted 1 chars at 1952597", what that mean? Is it a frameshift? But I can not found stop codon in the cds and pep sequences. How can I get the frameshift sequences? Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ogotoh/spaln/issues/23?email_source=notifications&email_token=AH6C4LX2JXQMIFAE5ZWKCULQQBTGNA5CNFSM4I3VMU5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECBZKCA#issuecomment-545494280, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH6C4LRXAKDN5X4VIPIZF6DQQBTGNANCNFSM4I3VMU5A.

binlu1981 commented 5 years ago

Thanks. The bed output format like below chr4 10741027 10742890 ARRS_RAT 697 - 10741027 10742890 0,255,255 11 79,138,73,85,136,68,60,194,45,61,60, 0,4776,5105,6812,9839,11412,12819,13008,14229,14398,18573 chr4 10739910 10742890 ARRS_XENLA 988 - 10739910 10742890 0,255,255 15 100,10,56,24,78,138,73,85,136,68,60,194,45,61,60, 0,8138,8425,9325,11174,15949,16278,17985,21012,22585,23992,24181,25402,25571,29746

what is the 5th column represent? Score? and what is 0,255,255 mean? Is the bed output sorted (first is best)?

Bin

ogotoh commented 5 years ago

Dear Bin,

I don't recommend to use BED format, because some information is lost. For example, we cannot distinguish a short intron and an ordinary gap, the precise location of an intron within a gapped region, etc.

Anyway, the 5th column denotes the alignment score normalized to 1000.

0,255,255 means a color code, which is not relevant for ordinary use.

Osamu,


差出人: Bin Lu notifications@github.com 送信日時: 2019年10月24日 23:32 宛先: ogotoh/spaln spaln@noreply.github.com CC: 後藤修 o.gotoh@aist.go.jp; Comment comment@noreply.github.com 件名: Re: [ogotoh/spaln] Segmentation fault (core dumped) (#23)

Thanks. The bed output format like below chr4 10741027 10742890 ARRS_RAT 697 - 107410273 107428906 0,255,255 11 79,138,73,85,136,68,60,194,45,61,60, 0,4776,5105,6812,9839,11412,12819,13008,14229,14398,18573 chr4 10739910 10742890 ARRS_XENLA 988 - 107399100 107428906 0,255,255 15 100,10,56,24,78,138,73,85,136,68,60,194,45,61,60, 0,8138,8425,9325,11174,15949,16278,17985,21012,22585,23992,24181,25402,25571,29746

what is the 5th column represent? Score? and what is 0,255,255 mean?

Bin

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ogotoh/spaln/issues/23?email_source=notifications&email_token=AH6C4LXM75N3GOHNIPI63HTQQGWW3A5CNFSM4I3VMU5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECFHXEY#issuecomment-545946515, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH6C4LX3OKK63VATDW2RX5LQQGWW3ANCNFSM4I3VMU5A.

binlu1981 commented 4 years ago

Dear Osamu, When I use the -M option with >1 value, strand info "+" or "-" seems missing in bed file. So I can't determine the strand direction. Please check if it's a bug? Thanks!

Bin