Closed wjwei-handsome closed 1 year ago
Hi, I have fixed the clippy
check warings.
Hi @wjwei-handsome, thanks for your contribution. The role of the needletail library is limited to parsing fastx files and generating kmers. Interpretation of the sequences in those files is best left to the user (or another library) due to the diversity of encoding choices used in bioinformatics. For example, encoding gaps as -
is just a convention, and N
can mean "any base" or "Asparagine" (at least if you're dealing with IUPAC standards).
Also, I suggest using a regular expression or some other means to count characters in the sequence rather than iterating over the sequence twice as this can be inefficient especially for large sequences.
Thanks, I ignored it. My fault!
Looking forward to the update :)
Hi, for some statistic of fast[a,q] files, count the gaps('n' or 'N', maybe '-' but not considered here) in a sequence could be necessary.
So, I add such a functions simply, and add a test case.
Best wishes.