Open mchaisso opened 2 years ago
Hi! If the content you pasted is really the one from your FASTA the sequence names are missing the leading >
symbol. Please check that you have a proper FASTA file:
>Orca_ADGRG2
AGCTAGCTACGTACGAC...
>Vaquita_ADGRG2
GTACGTAGCTACGACGAA...
In addition, your sequences should present a correct ORF so the translation into amino acids can properly work.
Shoot, that was me not pasting as code and git interpreting > as quote. They have the ">" as a normal fasta file.
Some are missing start codons, but I'm fixing that separately.
Ah, I see ;) Yeah, I think PoSeiDon also checks for a proper start codon. But most importantly that your sequences can be divided by 3 and present a proper ORF. Also, for obvious cases PoSeiDon should give a better error message to figure out why certain sequences are sorted out.
See comments here for checks that are performed: https://github.com/hoelzer/poseidon/blob/master/bin/fasta_format_checker.rb
I see the output of fasta checking:
BlueWhale_ADGRG2 HybridCattle_ADGRG2 WildYak_ADGRG2 BelugaWhale_ADGRG2 LongFinnedPilotWhale_ADGRG2 WhiteSidedDolphin_ADGRG2 YangtzeRiverDolphin_ADGRG2 Narwhal_ADGRG2 Porpoise_ADGRG2 Orca_ADGRG2 Vaquita_ADGRG2 Remove BLUEWHALE_ADGRG2 from the input because of bad characters in the sequence. Remove HYBRIDCATTLE_ADGRG2 from the input because of bad characters in the sequence. Remove WILDYAK_ADGRG2 from the input because of bad characters in the sequence. Remove BELUGAWHALE_ADGRG2 from the input because of bad characters in the sequence. Remove LONGFINNEDPILOTWHALE_ADGRG2 from the input because of bad characters in the sequence. Remove WHITESIDEDDOLPHIN_ADGRG2 from the input because of bad characters in the sequence. Remove YANGTZERIVERDOLPHIN_ADGRG2 from the input because of bad characters in the sequence. Remove NARWHAL_ADGRG2 from the input because of bad characters in the sequence. Remove PORPOISE_ADGRG2 from the input because of bad characters in the sequence. Remove ORCA_ADGRG2 from the input because of bad characters in the sequence. Remove VAQUITA_ADGRG2 from the input because of bad characters in the sequence.
PoSeiDon needs at least 3 homologous sequences as input. Your FASTA contains 0 sequence entries. Please add more sequences and try again.
There was a problem with the FASTA file (ADGRG2.fasta), stop.
However, the fasta file contents are: