Closed valentynbez closed 1 year ago
Hi @valentynbez ,
Thanks for reporting! It is not a filesize limit issue.
Could you print the line at 88742 in the SAM file, e.g., with sed '88742q;d' strobealignment.sam
?
Seems that your reads chunked_kmers.fa.gz
are in fasta format, so they should not have quality values (as in the fastq format). Are you sure your processed reads are in fasta format? If in fastq format, the quality value string needs to be the same length as the read (100 characters in your case).
Best, Kristoffer
It appears we don’t set the QUAL field correctly for FASTA input. Here’s how to reproduce the problem using our own test data only:
$ echo -e '>test\nACGT' | build/strobealign tests/phix.fasta -|samtools view -o out.bam
...
[E::sam_parse1] SEQ and QUAL are of different length
[W::sam_read1_sam] Parse error at line 4
samtools view: error reading file "-"
@ksahlin thanks for the prompt answer!
yes, chunked_kmers.fa.gz
it is definitely a FASTA file and there are no quality scores. It was created by concatenating all kmers from larger FASTA metagenomic assemblies.
The line 88742 is the one with @PG
@SQ SN:FENG15-1_SAMEA3136635_MAG_00000011-scaffold_42 LN:21674
@PG ID:strobealign PN:strobealign VN:0.9.0 CL:strobealign
THOM19-1_SAMN08814025_METAG-scaffold_73_phage_2_mvirs_12600-12699 16 FENG15-1_SAMEA3136632_MAG_00000064-scaffold_13 19657 60 13=1X51=1X5=1X15=1X12= * 0 0 ATCAAGGCATTGTTAATAATAATATAGACTCCAGTGTAAATATGGAAGCTGTAAGCCGGACATTGCCAAGATTGGTGCTCATTAAAGGCAACAAATCAAG NM:i:4 AS:i:180
Best Valentyn
Thank @valentynbez , @marcelm has identified and already proposed a fix (https://github.com/ksahlin/strobealign/pull/288) that will be merged into main soon.
Thanks again for reporting.
Hello, I want to align kmers of length 100 (simulating small reads) to reference genomes. I ran the following command on a file:
However, when I try to parse the output in
pysam
:When I tested on smaller files (1 kmer vs the same file) the results were parsable. Do you know what might be the issue? Is there a limit to the filesize?