Closed d-cameron closed 3 months ago
noodles-fastq does already support definitions with descriptions separated by a space (
), e.g.,
That is, the ASCII from immediately after the
@
till the first whitespace.
I updated the reader to split on either a space or horizontal tab (\t
) in fcc9767878e308cb76b95814ad7600f494095285. Is this sufficient, or do you expect the separator to be any locale-specific whitespace (described by isspace(3)
)?
The writer now supports custom definition separators as well (08f0a3fc0d65b5b5f2005028a1de7db132787822).
I updated the reader to split on either a space or horizontal tab (
\t
) in fcc9767. Is this sufficient, or do you expect the separator to be any locale-specific whitespace (described byisspace(3)
)?
That is sufficient. I haven't encountering any separator other than tab or space.
This behavior is now available in noodles 0.79.0 / noodles-fastq 0.14.0.
Currently the noodles fastq API only supports returning the full Name/header/Sequence identifier line for each record. It would be very useful if the noodles API supported extracting the read name from the header line. That is, the ASCII from immediately after the
@
till the first whitespace.Comments/metadata/additional information after the read name are extremely common. Examples include:
Illumina basespace https://help.basespace.illumina.com/files-used-by-basespace/fastq-files
SRA
SPAdes (error correction)
BFC (uses a TAB instead of SPACE)