Closed bounlu closed 11 months ago
Thanks for reporting, @bounlu.
I wasn't aware underscores were being used as a definition separator. 690481e6d7db0e01c782779c4c2cb246720ba2b8 adds an --record-definition-separator
option to the lint
command to override the default separators (/
and
(space)). E.g., in your command:
$ fq lint \
--record-definition-separator _ \
Auto_C1_1_val_1.fq.gz_unmapped_reads_1.fq.gz \
Auto_C1_2_val_2.fq.gz_unmapped_reads_2.fq.gz
Thanks for the quick fix. But this is not generalizable. I want to apply the same code to all sorts of FASTQ file without specifying the separator. Otherwise this needs extra code on the user side to check the separator from the input file.
Can we adopt this to check the first line of FASTQ file to determine the separator automatically?
The option as currently implemented seems to be the appropriate solution, as there is no way to determine the separator with a heuristic. For example, "sq 1|extra" is ambiguous.
FASTQ does not have a standard specification, and whitespace is the de facto definition separator. We also include the forward slash (/
) because fq was originally built strictly for Illumina read names.
Hi,
I get an unexpected error for
NamesValidator
:This is due to that the underscore
_
in the read name is not recognized, which is added byBismark
during processing as explained here. The validator parses the read name until the first space and in this case the space is replaced with underscore hence they don't match betweenR1
andR2
.I believe the code needs to be fixed to handle such cases.