richelbilderbeek / reports

I write quite some bug reports. I keep them here
GNU General Public License v3.0
0 stars 0 forks source link

ClustalOmega: suggestion for better error message #7

Open richelbilderbeek opened 4 years ago

richelbilderbeek commented 4 years ago

22 September 2020 08:28 9 KB From: Richel Bilderbeek To: clustalw@ucd.ie

Dear Clusal developers,

I contact you to help give ClustalOmega even more helpful error messages.

First, thanks for writing and/or maintaining Clustal!

Second, here is my suggestion to help ClustalOmega give a better error message:

For a FASTA file this (also attached):

>p1
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV
>p2
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCXXXXXXXXXXXXXXXSRVKNLNSSRVPDLLV
>p3
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

and running it with clustalo -i error.fasta, one gets the following error message:

HHalignWrapper:hhalign_wrapper.c:1419: problem in alignment (profile sizes: 1 + 2) (p3 + p1), forcing Viterbi
hh-error-code=4 (mac-ram=8000)
hhalign:hhalign.cpp:961: Problem Reading/Preparing profiles (len(q)=75/len(t)=0)
HHalignWrapper:hhalign_wrapper.c:1447: problem in alignment, Viterbi did not work
hh-error-code=4 (mac-ram=64000)
hhalign:hhalign.cpp:961: Problem Reading/Preparing profiles (len(q)=75/len(t)=0)
FATAL: could not perform alignment -- bailing out

I would expect the error message to be something like:

Cannot align sequence with only unknown amino acids
FATAL: could not perform alignment -- bailing out

Sure, this is a low priority issue (I can check so myself as well), but it would have save me quite some debugging. The sequences are from SARS-CoV-2, the E (from Envelope) protein: apparently one sequence went sideways :-).

I hope you will take my suggestion into account.

Thanks for again for writing and/or maintaining Clustal and cheers, Richel Bilderbeek

Attachments

MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILXXXRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLXXXXXXXXXXXXXXXXXXXXXXYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLXX
MYSFVSEETGTLIXXXXXXXXXXXXXLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYXXXXXXXXXXXXXFYXYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSXXXXSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYXXXXXXXXXXXXXXXXXXXXXXXXXSRVPDLXXX
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYXXXXXXXXXXXXXXXXYSRVKNLNXXRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYXXXXXXXXNXSRXXDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPFFYVYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLXXXXXXXXXXXXXXXXXXVYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSFRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVLDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTXLRLCAYCCNIVNVXXXXXXXXXXXXXXXXXXSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
MYSFVSEETGTLIVNSVLLFXXXXXXXXXXXXXLTALRLCAYCCNIVNVSLVXXXXXXXXXXXXXXXXXXXXXXXX
MYSLVSEETGXXXXXXXXXXXXXXXXXXXXXXXXTALRLCAYWCNIVNVSLVIPSLYVYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYXXXXXXXSLVKPSFYVYSRVKNLNXXXXXXXXXX
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSXXXXXXX
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVXXXXXXXXXXXXXXVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPYLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSCRVPDLLV
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNXXXVSLVKPSFYVYXXXXXXXXXXXXXXXXX
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTXXXXXXXXXXXXXXXXXXXXFYVYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNXNSSRVPDLLV
MYSFVSEETGTLIVNSXXXXXAFVVFLLXTLAILTALRXCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYXXXXXXXXXXXXXXXXXV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVXLVKPSFYVYSRVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYXXNIVNVSLVKPSFYVYSRVKNLNSSXXXXXXXX
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSLVKNLNSSRVPDLLV
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCXXXXXXXXXXXXXXXSRVKNLNSSRVPDLLV
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX