Right now (as of PR #5) when a sequence is loaded via drag and drop the first line is ignored, corresponding to FASTA data format. This will not work for all formats - default should probably be to compare the whole file, with auto-cleaning happening if we know the format.
From @rudi-cilibrasi on Discord:
some small notes on FASTA specifically the mitochondrial full genome we are fetching from GenBank:
we already strip off the first line. this is good. we also need to
convert everything to lowercase and
remove all characters that are not in the set {a,c,g,t}
throw away any sequences that are < 10k or > 20k in size after these transformations.
the reason for step 4 is because some sequences are misfiled and uncorrected in GenBank. they are filed as full genome but actually not full mito. so it is a little runtime data cleaning after the fetch
Right now (as of PR #5) when a sequence is loaded via drag and drop the first line is ignored, corresponding to FASTA data format. This will not work for all formats - default should probably be to compare the whole file, with auto-cleaning happening if we know the format.