roblanf / minion_qc

Quality control for MinION sequencing data
MIT License
211 stars 42 forks source link

Adding MinIONQC as module to MultiQC #40

Open ManavalanG opened 5 years ago

ManavalanG commented 5 years ago

Hi, I am working on including support for MinIONQC in MultiQC, which is a fantastic results aggregator. So far, I have added results using summary.yaml; see this html file for its current state - multiqc_report.html.zip. Your feedback is welcome.

But what would be truly useful is including plots below, as this would enable easier comparison of plots across samples.

This would require raw data output for above plots from MinIONQC. It would be great if you could enable writing relevant plot raw data as output (as csv, json, etc).

PS: If you would like to try my dev version of MultiQC, follow these steps:

  1. git clone/download the repo
  2. cd MultiQC
  3. python setup.py develop
  4. multiqc <dir_with_minionqc_results>
  5. open multiqc_report.html
roblanf commented 5 years ago

Sorry for being so slow - teaching!

I'll have a think about how best to do this. I know that in principle I can just grab the data out of the relevant ggplot objects. The issue is that these files will often be ridiculously large. That got me thinking that I should downsample them, and if I'm going to do that I should actually rewrite a lot of the code fairly substantially to speed everything up. But I'm working on it!

roblanf commented 5 years ago

And also, I should say - thanks for doing this! The current iteration looks great, and I absolutely agree that multiqc is awesome, so I'm very keen to push forward with this integration.

ManavalanG commented 5 years ago

I'm glad that you find this useful, and thanks for being open to the idea. I made some improvements to the module and submitted a pull request to multiqc. Here is how it looks now: multiqc_report.html.zip. Any suggestions on improvements or help-text edits on the report are welcome!

I briefly spent time last week on how to downsample/subsample the data while retaining the needed info for the plot, but I didn't get too far; partly because R is not my strong suit. If I find anything useful on this front, I will send them your way.

ManavalanG commented 5 years ago

@roblanf To your credit :)

csawye01 commented 5 years ago

Has MultiQC now been updated that it includes displaying the plots output by MinIONQC as well as the stats from summary.yaml?

ManavalanG commented 4 years ago

@csawye01 Yes.

roblanf commented 4 years ago

@ManavalanG - the MultiQC docs don't make it sound like the plots are included. If it's still of interest, I can make sure to output the plot data.

ManavalanG commented 4 years ago

Hi @roblanf! In my original pull request, I had plots for reads N50, median q and base count, but multiqc's author had replaced them with just readlength plot. This is probably why. This PR was handled and merged two months after I submitted it and by then I didn't have time to look into the changes made.

I found this paper in the wild in which figure 3 shows how the tables look in multiqc (doesn't show the plot though).

It's been a while since I worked with nanoplot data, but if you are interested in improving minionqc integration into multiqc, I am open to collaboration.

ManavalanG commented 4 years ago

Just realized that you were talking about major plots (length_by_hour, etc.) produced by minionqc and not something simple like base count for which tabulation might just be sufficient.

So yes, if minionqc outputs simplified data as originally requested in this issue, we could add them to multiqc in few weeks I would think.

humbleflowers commented 4 years ago

Hello @ManavalanG @roblanf, I am new to multiqc as well as minionqc and I really appreciate the dashboard which multiqc provides and the insights into data which minionqc provides. I was wondering why time based plots(length_by_hour, read_per_hour, yield_per_hour etc) are left out of multiqc with minionqc module. A full-integration of minionqc into multiqc will really help. Looking forward to the update. Thanks.

roblanf commented 4 years ago

Reopening this issue, with the hopes that I can get to it soon. Fair warning though - you can see already that it's taken me a long time, and right now I (like a lot of people) have my focus predominantly elsewhere.

humbleflowers commented 4 years ago

@roblanf is there a way i can get raw data(.tsv or .csv) used to plot all the graphs by minionqc? thanks