wrpearson / fasta36

Git repository for FASTA36 sequence comparison software
Apache License 2.0
116 stars 15 forks source link

Warning - unrecognized residue #19

Closed TimothyStephens closed 4 years ago

TimothyStephens commented 4 years ago

Hi,

I get the below error when aligning two repetitive protein sequences. This error occurs with both versions 36.3.8 and 36.3.8h_11-Feb-2020.

~/PROGRAMS/fasta-36.3.8g/bin/fasta 7082-1db84fc9-b9020b0d-966581850.372822.seq1 7082-1db84fc9-b9020b0d-966581850.372822.seq2
# /home/ts942/PROGRAMS/fasta-36.3.8g/bin/fasta 7082-1db84fc9-b9020b0d-966581850.372822.seq1 7082-1db84fc9-b9020b0d-966581850.372822.seq2
FASTA searches a protein or DNA sequence data bank
 version 36.3.8g Oct, 2018
Please cite:
 W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

*** Warning - unrecognized residue at 2:I - 73
*** Warning - unrecognized residue at 120:L - 76
*** Warning - unrecognized residue at 121:E - 69
 *** error [initfa.c:2916] (validate_params) - aa0[122] = [25 > 17] out of range
 *** error [comp_lib9.c:921] - validate_params() failed:
 --  /home/ts942/PROGRAMS/fasta-36.3.8g/bin/fasta 7082-1db84fc9-b9020b0d-966581850.372822.seq1 7082-1db84fc9-b9020b0d-966581850.372822.seq2

cat 7082-1db84fc9-b9020b0d-966581850.372822.seq1 
>seq1
MYIYDTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTVS
LE*
cat 7082-1db84fc9-b9020b0d-966581850.372822.seq2
>seq2
MTLVLALVLALVLALVLALVLALVLALVLALALVLVLVLALALALVLVLALVLALVLALV
LVLVLALALVLALVLALVLALVLALVLALVLALVLALVLVLALVLALVLALVLAPCRLSS

Thanks, Tim.

wrpearson commented 4 years ago

When you have a "protein" sequence that looks like a DNA sequence (>80% ACGT), you need to explicitly tell FASTA that you have a protein by using the "-p" option.

fasta36 -p query.file library.file.

Bill Pearson

Begin forwarded message:

From: TimothyStephens notifications@github.com<mailto:notifications@github.com>

Subject: [wrpearson/fasta36] Warning - unrecognized residue (#19)

Date: March 13, 2020 at 10:29:10 AM EDT

To: wrpearson/fasta36 fasta36@noreply.github.com<mailto:fasta36@noreply.github.com>

Cc: Subscribed subscribed@noreply.github.com<mailto:subscribed@noreply.github.com>

Reply-To: wrpearson/fasta36 reply@reply.github.com<mailto:reply@reply.github.com>

Hi,

I get the below error when aligning two repetitive protein sequences. This error occurs with both versions 36.3.8 and 36.3.8h_11-Feb-2020.

~/PROGRAMS/fasta-36.3.8g/bin/fasta 7082-1db84fc9-b9020b0d-966581850.372822.seq1 7082-1db84fc9-b9020b0d-966581850.372822.seq2

/home/ts942/PROGRAMS/fasta-36.3.8g/bin/fasta 7082-1db84fc9-b9020b0d-966581850.372822.seq1 7082-1db84fc9-b9020b0d-966581850.372822.seq2

FASTA searches a protein or DNA sequence data bank version 36.3.8g Oct, 2018 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

Warning - unrecognized residue at 2:I - 73 Warning - unrecognized residue at 120:L - 76 Warning - unrecognized residue at 121:E - 69 error [initfa.c:2916] (validate_params) - aa0[122] = [25 > 17] out of range *** error [comp_lib9.c:921] - validate_params() failed: -- /home/ts942/PROGRAMS/fasta-36.3.8g/bin/fasta 7082-1db84fc9-b9020b0d-966581850.372822.seq1 7082-1db84fc9-b9020b0d-966581850.372822.seq2

cat 7082-1db84fc9-b9020b0d-966581850.372822.seq1

seq1 MYIYDTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTVS LE* cat 7082-1db84fc9-b9020b0d-966581850.372822.seq2 seq2 MTLVLALVLALVLALVLALVLALVLALVLALALVLVLVLALALALVLVLALVLALVLALV LVLVLALALVLALVLALVLALVLALVLALVLALVLALVLVLALVLALVLALVLAPCRLSS

Thanks, Tim.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/wrpearson/fasta36/issues/19, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQYNP5NQ5YM5WYLHIJB6NTRHI7LNANCNFSM4LHDVH6A.

TimothyStephens commented 4 years ago

Thank you for the super quick reply.

I will keep this in mind and use the "-p" option from now on.

Thanks, Tim.