Truncated names in RefSeq organism results

Hi Sam,

Just a small bug - I've noticed that for some of the organism names the beginning of the name is removed from the organism results.

You can see this in some of your sample files, for example: https://github.com/transcript/samsa2/blob/master/sample_files_paired-end/6_RefSeq_org_results/control_1_TINY.RefSeq_annot_organism.tsv

Line 13 should be 'Prevotella sp. HMSC073D09', line 17 should be 'Bacteroidales bacterium KA00344' and so on. I fortunately haven't run into too many of these in my own results so I can just grep the truncated part to get the full name manually from the database header.

This happens with all of the organism names containing "sp." but also with others, so I can't quite work out why it's grabbing from the middle of the [organism name] from the RefSeq header (is it expecting only two words?).

Cheers, Rachael

transcript / samsa2

Truncated names in RefSeq organism results #16