mritchielab / restrander

MIT License
3 stars 1 forks source link

help understanding log file #2

Closed sparthib closed 11 months ago

sparthib commented 11 months ago

Hi, I have a quick question about the log output. The end of my output looks something like this.

[0m[32m Up to record 26700000...
[0m[32m Up to record 26800000...
[0m[32m Up to record 26900000...
[0m[32m Up to record 27000000...
[0m[32mFinished restranding!

Does this mean that restrander has processed up to 27M reads of the input file? And this would mean the first 27M reads in the input file, correct? (i.e. in case the process gets killed abruptly, does the log mean that the first 27M reads of the input file have been processed?

Or does the output mean that up to 27000000 reads have been added to the output file? i.e. if the input fastq file was bigger (say ~34M reads overall), and suppose the file has been fully processed, then can we interpret it as 27M reads have been filtered out of the 35M reads.

Thanks! Sowmya

jakob-schuster commented 11 months ago

The Up to record x... message means that restrander has processed x reads (and printed them to the output). Since it processes the file linearly, those would be the first x reads in the input. The Finished restranding! message means that the program reached the end of the input file (the message won't appear if the process gets killed, only if there's no more reads).

If your input file had ~34M reads, I'm not sure how restrander could decide to finish before reaching the end of the input - something is likely going wrong. Can you check the health of the input file? Maybe some weird read is breaking restrander. How long is the longest read in your input? (reads >500K bases can cause issues)

sparthib commented 11 months ago

Thanks for the clarification. The error was on my end. I forgot to uncompress my .fastq.gz input file before reading the number of lines in it. So, it does have only 27M reads.