Closed luispedro closed 4 years ago
My first impression is that NGLess is correct here. Empty lines in FastQ files should be considered a misformed file.
Is it normal to see this "in the wild"? Is it really just the special case that lines are present at the very end of the file?
I've seen cases where empty lines appear in the middle of the file if the sequence has length 0. Not really common (or useful) practice but some software can produce this as part of quality control/trimming.
The (python) FastQ libraries I used at the time managed to parse this as an empty sequence.
@unode: Good point! As long as you still have a header, then empty line, +
, then empty line. Indeed, this should be a well-formed empty sequence. A bit strange, but I can see how it would emerge and it makes sense to parse it correctly
@waakanni: is this the issue you are observing?
@luispedro Yes, the case I observed however is to do with an empy line present at the very end of the fastq.gz file
OK, is it just one empty line? Where do these samples come from?
I'm not against special casing this particular thing and issuing just a warning.
For my samples it is just the single line at the end of the file.
I can't say for definite the origin of the samples as I am not responsible for them.
Sorry for the delayed response but to follow up on your question @luispedro.
I have confirmed with Micheal that these files were generated a long time ago by MOCAT and that we don't really expect more files with the empty line in the future.
Thank you
I think this can be closed. It's not clear that we should really consider it a bug.
Originally posted on https://github.com/ngless-toolkit/ngless2018benchmark/issues/2
NGLess
load_mocat_sample
does not ignore empty lines at the end of a fastq.gz file. Not sure if this is a bug or just something it was not meant to doThis is happening on both versions 1.0.1 and 0.7.1