samtools / htslib

C library for high-throughput sequencing data formats
Other
789 stars 447 forks source link

Permit fastq output to create empty FASTQ records for seq "*". #1576

Closed jkbonfield closed 1 year ago

jkbonfield commented 1 year ago

This is rather questionable, but htslib can read in empty fastq records and generates "*" for SEQ and QUAL. The same is true vice versa now. Eg:

name    4   *   0   0   *   *   0   0   *   *

becomes

@name

+

Bwa mem and minimap2 can both read these fastq entries too, although the SAM output is bugged as it outputs an empty field instead of "*" for SEQ. (Note this is still readable by htslib and interpreted the same).

Potential reasons for accepting this:

Potential reason to reject:

Fixes samtools/samtools#1799

daviesrob commented 1 year ago

On discussion, we decided that avoiding mis-paired reads was preferable to avoiding empty sequences. I suspect this rarely happens, but I guess we may find out if anyone notices...