omnicoders / bio-geolocation

Looks up the location of sequences in GenBank and adds it to a FASTA file.
0 stars 1 forks source link

Species names with more than just two words are truncated #3

Open AlanRockefeller opened 6 years ago

AlanRockefeller commented 6 years ago

Most species are just two words - the genus and the species. However when we know the genus but the species is a guess, the abbreviation cf. or aff. is used to indicate that it's not necessarily that species.

Here is an example FASTA file which truncates the species name:

X>MG827096.1 Amanita cf. pantherina voucher MushroomObserver.org/306343 internal transcribed spacer 1 and 5.8S ribosomal RNA gene, partial sequence AAACTCAGGTAGGGGGGGAGGTGGTTGTAGCTGGCCCCCTAGTAAGGGCATGTGCACACTGTCTCTTTCTCTTGCTTGTTTTTTTCATTCTTTCCACTTGTGCACTGCTTGTAGGCAGCCTGGCATTGTTCAGGTTGTCTATGATTTTCTTTACATACATGAATAATCGTTGTACAGAATGTAATGAAAAAAAAAGTAATACAACTTTCAACAACGGATCTCTTGGCTCTCGCATC

Remove the X before the > - I just put it there so Github wouldn't see the carat as a quote.