zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
482 stars 52 forks source link

`noodles_fasta::reader::records::Records` returns an error on empty lines #189

Closed jch-13 closed 1 year ago

jch-13 commented 1 year ago

I'm using the noodles_fasta::reader::records::Records Iterator to read the hg19 reference FASTA file provided by the 1KGP (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz). Right after the MT sequence, this files contains an empty line which causes the Iterator to fail parsing the Definition, returning an io::Error with ErrorKind::InvalidData.

I wonder whether returning Errors on empty lines is the intended behavior of this Iterator? And what's the proper way of ignoring specifically this error type without also ignoring other InvalidData cases?

zaeleus commented 1 year ago

Thanks for reporting this example. There's evidence that it was poorly joined from multiple sources, and I'm surprised existing FASTA parsers accept empty lines.

I'll change the sequence reader to also skip empty lines.

zaeleus commented 1 year ago

This is now changed in noodles 0.46.0 / noodles-fasta 0.27.0.

jch-13 commented 1 year ago

Thank you!