Open Naturalist1986 opened 4 months ago
If I understand this correctly, this is a nucleotide-vs-protein search (translated search), right?
I think we have a bug for the SAM backtraces for this mode, where either the back trace should be converted to the nucleotide match count (not codon match count), or the nucleotide sequence should be translated to a protein sequence.
Do you expect this search to return a nucleotide sequence or a protein sequence?
@milot-mirdita, we are also interested in the SAM formatted output for translated searches. In our use case, we would like to get a nucleotide output. Many thanks!
I'm searching a metagenome (nucleotides) against a protein catalogue. I need to know how many reads mapped to each protein from the catalogue for functional analysis. I guess I expect the search to return the matches for each read?
I pushed a fix that should correct some of the weirdness with SAM. Please check if this is what you expected as we don't use much sam internally.
Thanks! Do I need to download and recompile mmseqs? or can I just replace with the fixed file?
You can download precompiled binaries from: https://mmseqs.com/latest/
you don’t need to compile them yourself
Expected Behavior
Hi,
I'm trying to convert a SAM file created from a search result but I get this error:
(mmseqs) moshea@shannon:/mnt/scratch$ samtools view -Sb SRR5234505.sam > SRR5234505.bam [E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1_sam] Parse error at line 363193 samtools view: error reading file "SRR5234505.sam"
This is the part of the file where the error refers to:
(mmseqs) moshea@shannon:/mnt/scratch$ sed -n '363193p' SRR5234505.sam
SRR5234505.241048 16 NZ_SMKV01000006.1_67 26 254 35M 0 0 GGTCTTACCTGCGCTCCACAGGCTGTCACCGCCAGCCCGGCGGCCTTCGCGACGTTGACCGCCGCGTTGATGTCCCGGTCCAGGTACGTCCCGCACGACCCGCAC AS:i:98 NM:i:12
Maybe the sam file is not formatted properly?
I used:
mmseqs convertalis SRR5234505_DB curated_protein_catalogue SRR5234505_Result SRR5234505 --format-mode 1
Thanks for your help!