Smart sequence file auto-cleaning

From @rudi-cilibrasi on Discord: some small notes on FASTA specifically the mitochondrial full genome we are fetching from GenBank:

we already strip off the first line. this is good. we also need to
convert everything to lowercase and
remove all characters that are not in the set {a,c,g,t}
throw away any sequences that are < 10k or > 20k in size after these transformations. the reason for step 4 is because some sequences are misfiled and uncorrected in GenBank. they are filed as full genome but actually not full mito. so it is a little runtime data cleaning after the fetch

joyhughes / libqsearch-clean