smithlabcode / falco

A C++ drop-in replacement of FastQC to assess the quality of sequence read data
https://falco.readthedocs.io
GNU General Public License v3.0
96 stars 10 forks source link

improve: process fastq wth different systems line endings https://git… #29

Closed Shelestova-Anastasia closed 2 years ago

Shelestova-Anastasia commented 2 years ago

fix for https://github.com/smithlabcode/falco/issues/24 - determine fastq line_endings on file load

guilhermesena1 commented 2 years ago

Why is field_separator no longer const? From what I can see the change decides the separator at the constructor, so it should be possible to keep it const and populate it when the StreamReader/GzStreamReader etc is constructed.

Keeping const variables const is important for the program speed. We will compare each character with the field separator for every FASTQ entry. If it is not const the memory access to it will be slower. I think the proper syntax would be to replace the superclass constructor

   StreamReader(_config, _buffer_size, '\n', '\n') {

with

   StreamReader(_config, _buffer_size, get_field_separator(filename), get_line_separator(filename)) {

where get_field_separator would decide which character separates field entries (it would be \n for fastq but \t for SAM, but the function would allow other edge cases to be tested)

Shelestova-Anastasia commented 2 years ago

@guilhermesena1 Thank you for review! I'm not from c++ world, so I make mistakes.

Now I fixed constructors. I hope I understood your instructions correctly.