Hi! I'm using sourmash signature kmers to extract kmers and a fasta from a signature of hashes of interest and the original fasta file.
My command:
sourmash sig kmers --signatures <sig file of matches> --sequences <fasta file> --save-sequences <output name> --save-kmers <output name2>
I got an error when a sourmash came across a kmer with an N.
Traceback (most recent call last):
File "/home/jupyter-jessica/.local/bin/sourmash", line 8, in <module>
sys.exit(main())
File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/__main__.py", line 13, in main
return mainmethod(args)
File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/cli/sig/kmers.py", line 91, in main
return sourmash.sig.__main__.kmers(args)
File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/sig/__main__.py", line 1148, in kmers
for kmer, hashval in kh_iter:
File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/minhash.py", line 387, in kmers_and_hashes
hashvals = self.seq_to_hashes(sequence,
File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/minhash.py", line 360, in seq_to_hashes
hashes_ptr = self._methodcall(lib.kmerminhash_seq_to_hashes, to_bytes(sequence), len(sequence), force, bad_kmers_as_zeroes, is_protein, size)
File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/utils.py", line 25, in _methodcall
return rustcall(func, self._get_objptr(), *args)
File "/home/jupyter-jessica/.local/lib/python3.8/site-packages/sourmash/utils.py", line 78, in rustcall
raise exc
ValueError: invalid DNA character in input k-mer: <kmer with an N>
The documentation for sourmash sig kmers says: By default, sig kmers ignores bad k-mers (e.g. non-ACGT characters in DNA). If --check-sequence is provided, sig kmers will error exit on the first bad k-mer.
Hi! I'm using
sourmash signature kmers
to extract kmers and a fasta from a signature of hashes of interest and the original fasta file.My command:
I got an error when a sourmash came across a kmer with an N.
The documentation for
sourmash sig kmers
says: By default, sig kmers ignores bad k-mers (e.g. non-ACGT characters in DNA). If --check-sequence is provided, sig kmers will error exit on the first bad k-mer.Docs: https://sourmash.readthedocs.io/en/latest/command-line.html#sourmash-signature-kmers-extract-k-mers-and-or-sequences-that-match-to-signatures
So, the docs should be updated to say by default non-ACGT will cause sig kmers to exit.