hzi-bifo / RiboDetector

Accurate and rapid RiboRNA sequences Detector based on deep learning
GNU General Public License v3.0
96 stars 16 forks source link

Ability to produce a log file that can be easily processed by MultiQC #33

Closed Rohit-Satyam closed 9 months ago

Rohit-Satyam commented 1 year ago

This is insanely awesome tool for rRNA detection and far far better than sortMeRNA. I see that the tool produce some log which is printed on screen dictating how many reads were rRNA out of the total input reads. If ribodetector could produce a log file by default than we could include those numbers in MultiQC report as well and that would be awesome. Currently, I am using wc -l file.fastq | awk '{print $1/4}' as a work around to get rRNA reads.

dawnmy commented 1 year ago

We're thrilled to hear that you find it useful and effective for rRNA detection. It is a nice suggestion. We'll definitely take your suggestion into consideration and add this feature in an upcoming release. Before we add the log file functionality, you can also try to redirect the stdout into a log file with > {sample_id}.log.

dawnmy commented 10 months ago

https://github.com/hzi-bifo/RiboDetector/blob/0c20b2cb8a6d16abd0f5d44a56b5a37d39600d9a/ribodetector/detect.py#L42

https://github.com/hzi-bifo/RiboDetector/blob/0c20b2cb8a6d16abd0f5d44a56b5a37d39600d9a/ribodetector/detect_cpu.py#L40C29-L40C29

set log file in the above code lines

dawnmy commented 10 months ago

report the total number of reads and number of predicted rRNA, nonrRNA reads even with chunked input data

claczny commented 9 months ago

Unfortunately, the suggestions to redirect to stdout does not work for me. It still creates a file ribodetector.log in the directory from which I am launching ribodetector_cpu.

I have the following content in my conda yaml file:

channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - ribodetector=0.2.8

Any suggestions on how to resolve this? Have I missed a command line option maybe? It would be important to be able to have per-sample log files, especially when using ribodetector with a workflow system like snakemake or nextflow, which facilitate processing of many samples.

dawnmy commented 9 months ago

@claczny You are right, in the new version (0.2.8), the log file will be created automatically in the working dir. And the previous version 0.2.7 will just print out the log in stdout. But you should be able to redirect the stdout of both version to a file you specified. In the next version I will enable the option for a per sample log file.

claczny commented 9 months ago

Thank you for the quick reply. I seem not to understand how to

redirect the stdout of both version to a file you specified

Could you please let me know how to do this with version 0.2.8? It doesn't work if I use > {sample_id}.log

dawnmy commented 9 months ago

have you tried <cmd> &> {sample_id}.log ?

claczny commented 9 months ago

No, but will try. So not just stdout but also stderr, I see.

dawnmy commented 9 months ago

yes, I assume so, you could try it

claczny commented 9 months ago

It worked by redirecting the output to per-sample logs but it also created a ribodetector.log in the working directory. That log in the working directory also mixes the log-output partially. This means that the INFO Writing output rRNA sequences into file: line lists one {sample.id} as output, while the INFO Writing output non-rRNA sequences into file line lists another {sample.id} as output; I ran two individual samples simultaneously. Also, the per-sample logs both have INFO Log file: ribodetector.log even though the actual filenames of those logs are different.

I'd suggest adding respective information to the documentation and fixing this behavior in a future release would be beneficial.

Thanks a lot for your support!

dawnmy commented 9 months ago

@claczny Thank you for bringing the unexpected logging behavior to our attention. I plan to address and rectify this issue in the upcoming version. In that release, the log file will be generated with the same name as the output fastq file (for example, <sample_id>.log) and will be located in the output directory.

claczny commented 9 months ago

You're welcome. May I ask why you are creating a log-file by default at all? Why not have this as an option for the user and output to stdout/stderr by default? Mandating a specific output (file) may break certain workflows, I reckon.

dawnmy commented 9 months ago

Your suggestion is more sensible. In earlier versions, RD only printed logs to stdout and did not save them to a file. However, based on user requests, in version 0.2.8, I implemented the feature to save logs into a file in the working directory. That said, I agree that it would be more flexible to allow users the choice to decide if they need a log file and to specify a path for it. This is something I will consider in the next release.

dawnmy commented 9 months ago

Upon double-checking, I've realized that users can already specify the log file name using the --log <logfile> option. If this option isn't set, a default log file named ribodetector.log is created in the working dir. However, based on the feedback, I plan to modify this so that no log file is generated unless the --log option is explicitly set.

claczny commented 9 months ago

Interesting. I seem to have missed this option from the documentation. Would have used that then 🙂 Am 20. Dez. 2023, 17:36 +0100 schrieb Z.-L. Deng @.***>:

Upon double-checking, I've realized that users can already specify the log file name using the --log option. If this option isn't set, a default log file named ribodetector.log is created in the working dir. However, based on the feedback, I plan to modify this so that no log file is generated unless the --log option is explicitly set. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

dawnmy commented 9 months ago

I haven't yet updated the documentation for version 0.2.8 😂, but you can have the help info via ribodetector/ribodetector_cpu --help

dawnmy commented 9 months ago

Just released version 0.2.9, which should have solved this logging issue

dawnmy commented 9 months ago

@claczny @Rohit-Satyam Please try the latest version 0.3.0. When --log <logfile> is set log message will be saved into the specified file, otherwise it will be only printed on stdout. The log message contains more useful information including the total number of reads processed, the numbers of predicted rRNA and non-rRNA reads.