Closed mikelchtermans closed 6 months ago
Michaël,
Thank you for your diligent work! The kmer you identifed is found also (as an artifact from a BAC I imagine) in two different human reference genome chromosome sequences. I am surprised this kmer survived the final merging process, but I will take a closer look to ensure there isn't a problem. Meanwhile I can remover that kmer from the human_filter db and then you could update your db. Does that seem reasonable to you?
Respectfully, Ken
Hi Ken, thank you for the quick response! Your proposed actions do seem reasonable to me :) In the meantime I have also found 2 human kmers in the Neisseria meningitidis reference genome (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_008330805.1/): TAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGC and TTGCTTGCTTGCTTGCTTGCTTGCTTGCTTTC . These kmers are not present in its cgMLST schema as far as I can see.
On the bright side, I also did check a few other bacterial reference genomes from common diseases and have not found any human kmers in them.
Kind regards, Michaël
Michaël,
Again thank you for such diligent work! I have updated the db to new version 20231218v2
Works like a charm now, thank you!
Hi,
I used the aligns_to compiled tool as a standalone to scrub the Escherichia coli reference genome (https://www.ncbi.nlm.nih.gov/nuccore/NC_002695.2/) and found that there is one supposedly human kmer (using the -print_kmers_only flag) in the reference genome, namely CACCACCATTACCACCACCATCACCACCACCA . This kmer is also present in several alleles in a gene in Enterobase's E. coli cgMLST scheme, specifically in locus b0001 (https://enterobase.warwick.ac.uk/schemes/Escherichia.cgMLSTv1/b0001.fasta.gz) e.g. allele 2. This will lead to comparability issues if I decided to scrub raw reads before running a cgMLST analysis.
I wonder how it is possible that this kmer from a non-eukaryotic RefSeq genome is present in the human database, because according to the documentation on how the human kmer database is built, it should have been substracted;
Kind regards, Michaël