rhpvorderman / sequali

Fast sequencing data quality metrics
GNU Affero General Public License v3.0
11 stars 0 forks source link

MultiQC integration improvement suggestion #121

Closed rhpvorderman closed 4 months ago

rhpvorderman commented 6 months ago

@Redmar-van-den-Berg Zou het niet handig zijn om de sequali rapporten in MultiQC samen te voegen, met knoppen voor de R1 en R2? Want nu krijg je resultaten voor sample voor de meeste tools zoals STAR, maar voor de QC sample_R1 en sample_R2.

Misschien is dit zelfs iets dat je in sequali kunt oplossen, door meerdere FastQ bestanden als input toe te staan, en dat de resultaten dan samen in een per-sample json bestand komen rrvandenberg

fastp doet dat ook, en in MultiQC kan je dan tussen read1 en read2 wisselen

Redmar-van-den-Berg commented 6 months ago

You can add custom configuration values for you module to MultiQC. It might be nice to add an option like merge_readpairs, and have it set to false by default. That way, power users can enable this feature while the normal behaviour remains unchanged.

rhpvorderman commented 6 months ago

Autodetection of readpairs seems to be a sensible option. The other option is to make sequali read-pair aware. That can be done in a simple fashion, where each pair get their own file or in a complicated fashion, where read-pair aware info is added.

Personally I think spending a lot of time on short-read sequencing data is not warranted. The technology is showing its age at the moment. With DRAGEN, building tools to run in pipelines becomes less and less relevant for Illumina. Also the tools that are already there are very entrenched, with few people willing to change for something new.

Long-read data however is rapidly becoming the new standard.

On the other hand, good illumina support is necessary. I think your suggestion to do the pairing in MultiQC using functionality there can be quite interesting. That requires as major effort too though, so I will put that on the backlog for a while as I have some other tasks that I need to attend to.

Redmar-van-den-Berg commented 6 months ago

This is also something that is being worked on in MultiQC itself: https://github.com/MultiQC/MultiQC/issues/542

rhpvorderman commented 5 months ago

I have been thinking about it some more. What I can do is simply create 2 JSON files. In the meta section I can indicate that it is part of a paired system and whether it is forward and reverse. I can also give the filename of the other half of the pair.

Then in MultiQC I can have

Which can be toggled on and off.

Report modules that were generated from combined data will simply be duplicated in both JSON files that are produced. That way I don't need to have a separate structure for single-end and paired-end. Which is extra nice as long-read data does not have paired reads.

What do you think about it?

Redmar-van-den-Berg commented 5 months ago

I think you have to be careful here, since MultiQC has a lot of options to

  1. Guess the sample name from the filename
  2. Rename samples automatically
  3. Manually override the sample name based on e.g. the path of the log file.

Especially point 3 is very useful when building pipelines, since you don't have to follow the output folder structure and naming convention that MultiQC happens to expect for your tool. However, this does mean that you have to be careful about the assumptions you make about how MultiQC will parse the output files, and derive the sample name from them.

How about defining the sequali JSON output to always be a list? For single ended, you have a list of size 1 For paired end, you have a list of size 2

That way, you don't have to guess which of the JSON outputs belong together. Alternatively, you could do something similar but also add a sample name to the JSON, so MultiQC doesn't have to guess. Although that requires more input from the user.

rhpvorderman commented 5 months ago

I did it. Sequali is now paired-end aware. The code has been co-developed with a MultiQC pr that is currently open. Version 0.8.0 has full paired-end read support.

rhpvorderman commented 4 months ago

Available in MultiQC version 1.22