zavolanlab / htsinfer

Infer metadata for your downstream analysis straight from your RNA-seq data
Apache License 2.0
9 stars 22 forks source link

fix: seqid with dash not matching regex #165

Closed balajtimate closed 1 month ago

balajtimate commented 2 months ago

Describe the bug Some paired-end fastq files (mainly from the sequencing facility, not SRA) contain a - in their seqID's (probably in the flowcell description part), and they don't match the regex for Casava >=1.8 format, e.g. @M02861:265:000000000-DMNF3:1:1101:12765:1724 1:N:0:GCGTCAAT; as a result, the samples don't get recognized as pairs leading to incorrect library type and read orientation inference.

Expected behavior Update the regex as these seqID's should match the Casava format.