MirTop output changes depending on sample quality

apeltzer commented 1 year ago

Expected behavior and actual behavior.

MirTop seems to create certain summary metrics only for samples with significant quality in the output file(s).

This makes downstream tools such as MultiQC break frequently, as MirTop just skips reporting these stats in the output file - while MultiQC depends on them to be present. An example is this here: https://github.com/ewels/MultiQC/issues/1778

Other examples: https://github.com/ewels/MultiQC/pull/1723 and https://github.com/ewels/MultiQC/pull/1716 where we found different ids in the JSOn to be missing.

It would be great to always report all values in order to be on the safe side and not break any downstream tools anymore. Can't we just default to report 0 in such cases where there is no data?

Steps to reproduce the problem.

a.) Take a sample of (debatable) quality, run MirTop. b.) Run MultiQC on the output to generate a report --> MultiQC fails.

Specifications like the version of the project, operating system, or hardware.

Doing this in the nf-core/smrnaseq workflow, data is sometimes just bad quality but I have several hundreds of samples where only a minor amount <<5% are failing due to this.

lpantano commented 1 year ago

It is a good idea, I have it in mind, I just need 30min to address this. I need to get all the categories from here: https://github.com/ewels/MultiQC/blob/master/multiqc/modules/mirtop/mirtop.py#L61 (+ _sum _count _mean) to here: https://github.com/miRTop/mirtop/blob/dev/mirtop/gff/stats.py#L106.

apeltzer commented 1 year ago

Yes looks like it 👍🏻 Happy to test if you have a pre-release version to make sure it works nicely? :)

miRTop / mirtop