lmrodriguezr / nonpareil

Estimate metagenomic coverage and sequence diversity
http://enve-omics.ce.gatech.edu/nonpareil/
Other
42 stars 11 forks source link

Output model results in JSON #63

Closed fgvieira closed 5 days ago

fgvieira commented 7 months ago

Would it be possible to output the model results i json? In case it helps, I made two tentative functions:

  library("jsonlite")

  export_curve <- function(object){
    # Extract variables
    n <- names(attributes(object))[c(1:12,21:29)]
    x <- sapply(n, function(v) attr(object,v))
    names(x) <- n
    # Extract vectors
    n <- names(attributes(object))[13:20]
    y <- lapply(n, function(v) attr(object,v))
    names(y) <- n

    append(x, y)
  }

  export_set <- function(object){
    y <- lapply(object$np.curves, "export_curve")
    names(y) <- sapply(object$np.curves, function(n) n$label)
    jsonlite::prettify(toJSON(y, auto_unbox=TRUE))
  }
jfy133 commented 3 weeks ago

@lmrodriguezr this would be my last 'request' in the recent run of new features! It would then help get nonpareil into a lot of metagenomics pipelines (as MultiQC is very popular), as in the linked PR on the MultiQC repo.

Typically MultiQC needs a 'log' file or any plan text file with a distinguishing bit of text (e.g. the tool name on the first line), and then the python-parsable information for plotting.

The JSON from the function @fgvieira shows above above is then parsed by MultiQC by @fgvieira 's MultiQC to make the HTML report in the zip below (single sample in this case - side note: I couldn't get the multi-sample test data one to be picked upby MultiQC btw @fgvieira - any ideas)

But if it's possible, could be represented in more simpler structures like CSV (Rather than maybe the more complicated JSON), however it would have to be one file per sample, or one file for multiple-samples (you can't split the data into multiple files)

image

multiqc_report.zip

lmrodriguezr commented 1 week ago

Hello, I'm now including a simple script that includes the JSON code (thank @fgvieira !) and also generates some summary tables (in CSV and TSV). It's all pretty simple, and it's missing a lot of the graphical options, but it should solve many "simple" use cases.

Please let me know if this addresses the requests. I decided not to include the JSON code directly in the R package because I don't have good experiences expanding library requirements with CRAN, and I don't want to unnecessarily complicate the package installation process. However, I can't think of any cases in which the JSON file would be needed from within R (after creating it in R), so this should do. Please let me know if you think it'd be better having it in the package :)

I'm leaving this open for now, but feel free to close it if you think this is fully addressed now

jfy133 commented 1 week ago

Hi @lmrodriguezr !

Thank you very much! For my user case (and I suspect @fgvieira's) I believe this will work, assuming we can include the R script itself alongside main binary/R package in the conda recipe.

I think this will still be possible by just copying the script into the bin/ directory of the conda environment, and just have it as a separate thing that can be called within the conda environment 👍.

I can't test this myself today, so unless @fgvieira confirms the script works as expected for MultiQC, then I'll test it maybe on Thursday. If all good maybe you can do a new release @lmrodriguezr and then I can update the conda recipe to include the additional libraries.

jfy133 commented 6 days ago

Just tested, and UX wise, works well! Nice and simple!

I also tried testing the JSON output with @fgvieira 's MultiQC module, and it didn't work because the output JSON from

./NonpareilCurves.R --json test.json *.npo

Had the following fields missing, that the MultiQC module was expected:

I'm not sure if that's a disconnect between the @fgvieira 's original script vs the new script vs the MultiQC module, but I wanted to report it.

Would it be possible to add those fields to the script @lmrodriguezr (a quick check seems to imply that those bits of @fgvieira 's code aren't in the new script AFAICS) ?

lmrodriguezr commented 6 days ago

Thank you @jfy133 ! I found the script in MultiQC, I'm updating it now

jfy133 commented 6 days ago

That now works!

As proof, here are the test data (in zip) and commands I used:

./NonpareilCurves.R --json test.json *.npo --tsv test.tsv
multiqc .

The resulting MultiQC report (Also in Zip) looks like this :tada:

image

nonpareil-curves.zip

If you're happy with this, if you can do a release @lmrodriguezr and I'll update the bioconda recipe :D

lmrodriguezr commented 5 days ago

This is great! I just created the new release, and I'm closing the issue. Thanks!