stjude-rust-labs / fq

Command line utility for manipulating Illumina-generated FASTQ files.
MIT License
75 stars 5 forks source link

fq lint output #39

Open darked89 opened 2 months ago

darked89 commented 2 months ago

Hi,

I have used fq lint on a bunch of files piping the output to a one text file. Two issues:

  1. if the given fastq file had some issue/didn't validate, there is no line in the output on stdout. One can redirect stderror to the same/different file but after processing few hundreds fastq files I will rather parse the output which I already have. Which brings me to the next point
  2. the fq lint output is fine to read but bit tricky to parse. Contains some formatting chars but no file/path in each row. Getting 'file.foo 12345678or ratherfile_path number_of_reads` is not obtainable by simple grepping.

Hope it helps

Darek

zaeleus commented 2 months ago

fq lint does not provide an output. Its usage is meant to be either be a success or failure, signaled by the process's exit code. stdout only contains simple log messages of the command's execution.

Are you able to provide a more concrete example of what you're trying to achieve?

darked89 commented 2 months ago

Thank you for a really fast response.

I have two main goals:

  1. check that a given FASTQ file is correct
  2. since fq does output number of reads when FASTQ was validated, I want to get it.
2024-07-29T20:29:08.280922Z  INFO fq::commands::lint: fq-lint start
2024-07-29T20:29:08.371633Z  INFO fq::commands::lint: validating single end read
2024-07-29T20:29:08.371649Z  INFO fq::validators: disabled validators: []
2024-07-29T20:29:08.371659Z  INFO fq::validators: enabled single read validators: ["[S003] NameValidator", "[S004] CompleteValidator", "[S002] AlphabetValidator", "[S001] PlusLineValidator", "[S005] ConsistentSeqQualValidator", "[S006] QualityStringValidator"]
2024-07-29T20:29:08.371667Z  INFO fq::validators: enabled paired read validators: []
2024-07-29T20:29:08.371671Z  INFO fq::commands::lint: starting validation
2024-07-29T20:39:57.031928Z  INFO fq::commands::lint: read 48843609 records
2024-07-29T20:39:57.031963Z  INFO fq::commands::lint: fq-lint end

Instead of playing with existing output lines the best would be to have a final line:

RESULT file_path_or_name validation_passed 48843609

For the failed one (if possible):

RESULT failed_file_path_or_name failed 0_or_num_of_reads_before_fail

In both cases columns separated by TABs (easiest to read), no special chars to beautify that RESULT line of the output. That way ingestion of the useful data would be trivial.

Many thanks for your help