petrelharp / context

Context-dependent mutation rate inference machinery.
0 stars 0 forks source link

Rscript-callable script that parses output and gives summaries #23

Closed matsen closed 9 years ago

matsen commented 10 years ago

This is @petrelharp 's idea for sure, and it's a good one. I thought that we could discuss what would happen here.

Not to be all hipster-coder, but I think it would be fun to have it spit out an R-markdown document and/or its html counterpart with the inferences and MCMC diagnostics...

petrelharp commented 10 years ago

GREAT hipster idea.

On Tue, Aug 19, 2014 at 1:14 PM, Erick Matsen notifications@github.com wrote:

This is @petrelharp https://github.com/petrelharp 's idea for sure, and it's a good one. I thought that we could discuss what would happen here.

Not to be all hipster-coder, but I think it would be fun to have it spit out an R-markdown document and/or its html counterpart with the inferences and MCMC diagnostics...

— Reply to this email directly or view it on GitHub https://github.com/petrelharp/context/issues/23.

matsen commented 10 years ago

OK, I'm going to take this on.

matsen commented 10 years ago

I'm going to be using https://github.com/rstudio/rmarkdown. Complain if you aren't happy depending on pandoc and/or a non-CRAN repo.

petrelharp commented 10 years ago

awesome.

On Tue, Sep 2, 2014 at 12:53 PM, Erick Matsen notifications@github.com wrote:

I'm going to be using https://github.com/rstudio/rmarkdown. Complain if you aren't happy depending on pandoc and/or a non-CRAN repo.

— Reply to this email directly or view it on GitHub https://github.com/petrelharp/context/issues/23#issuecomment-54207755.

matsen commented 10 years ago

OK, a first draft of this is now pushed. Actually depends on the CRAN pander package now. Try it out with result-to-html.sh.

Here is an example output. It doesn't work on MCMC output yet.

Would love hearing what you think would be most useful to have in it. Should it hunt around for residuals TSVs too? Would it be worth thinking about how to have the inference code have a file that records all of the associated files with running?

petrelharp commented 10 years ago

Hey, cool.

I think the useful things to see would be:

  1. fitted parameter values
  2. window sizes and amount of data used to fit
  3. result of fitting operation (i.e. message, hopefully "Converged.")
  4. residuals for sufficiently short T-mers you can look at all of them (2-2? 3-1?)
  5. the top and bottom residuals for longer T-mers (9-5? 9-9?)
  6. if present, MCMC traces
  7. ... and posterior marginal distributions.

Other stuff, e.g. the projmatrix, is useful for debugging but not for parsing the output (esp for big objects!).

-p

On Thu, Sep 4, 2014 at 7:59 AM, Erick Matsen notifications@github.com wrote:

OK, a first draft of this is now pushed. Actually depends on the CRAN pander package now. Try it out with result-to-html.sh.

Here http://f.cl.ly/items/3x2S1a1D3i051o2R0Z3D/sim-tasep-123456-genmatrix-4-complete-54321.html is an example output. It doesn't work on MCMC output yet.

Would love hearing what you think would be most useful to have in it. Should it hunt around for residuals TSVs too? Would it be worth thinking about how to have the inference code have a file that records all of the associated files with running?

— Reply to this email directly or view it on GitHub https://github.com/petrelharp/context/issues/23#issuecomment-54489898.

matsen commented 10 years ago

So you are a fan of searching around for things rather than outputting some file that directs downstream analysis to relevant files? Or inference could just be isolated in a directory.

petrelharp commented 10 years ago

I'm not really a fan of just searching around. I was thinking the residuals could be computed on the fly, since it's not too intensive. If so, I don't think we need more than one file? Otherwise, the directory option sounds pretty good to me.

-p

On Thu, Sep 4, 2014 at 10:20 AM, Erick Matsen notifications@github.com wrote:

So you are a fan of searching around for things rather than outputting some file that directs downstream analysis to relevant files? Or inference could just be isolated in a directory.

— Reply to this email directly or view it on GitHub https://github.com/petrelharp/context/issues/23#issuecomment-54512931.

matsen commented 10 years ago

+1 for one file.

matsen commented 10 years ago

In discussion, we decided for the report generation to depend on both the model .RData and the counts.

Here is what I have so far for a custom report, generated by:

stoat context/json-cpg ‹master*› » ../templated-Rmd.sh ../generic.Rmd test-cpg-123456-genmatrix-3-cpg-54321.RData                       

with the latest push.

matsen commented 10 years ago

OK, with https://github.com/petrelharp/context/commit/7875fc2c2f2f13c9aae1deb13f3312d9626b9e2d we are up to looking like this.

I'd do the MCMC bits too, but it doesn't seem like the code's really ready, with hardcoded settings like this. If you extract that functionality out of the scripts I'd be happy to plug it into the .Rmd.

petrelharp commented 10 years ago

wow, looking nice.

back to this soon. monday?

On Tue, Sep 30, 2014 at 4:56 PM, Erick Matsen notifications@github.com wrote:

OK, with 7875fc2 https://github.com/petrelharp/context/commit/7875fc2c2f2f13c9aae1deb13f3312d9626b9e2d we are up to looking like this http://cl.ly/code/390Z2C411Y40/test-cpg-123456-genmatrix-3-cpg-54321.html.

I'd do the MCMC bits too, but it doesn't seem like the code's really ready, with hardcoded settings like this https://github.com/petrelharp/context/blob/master/pairplots-mcmc.R#L21. If you extract that functionality out of the scripts I'd be happy to plug it into the .Rmd.

— Reply to this email directly or view it on GitHub https://github.com/petrelharp/context/issues/23#issuecomment-57401055.

matsen commented 10 years ago

No rush at all. I'm going to de-assign myself from this though.

On Wed, Oct 1, 2014 at 1:09 PM, Peter Ralph notifications@github.com wrote:

wow, looking nice.

back to this soon. monday?

On Tue, Sep 30, 2014 at 4:56 PM, Erick Matsen notifications@github.com wrote:

OK, with 7875fc2 < https://github.com/petrelharp/context/commit/7875fc2c2f2f13c9aae1deb13f3312d9626b9e2d>

we are up to looking like this < http://cl.ly/code/390Z2C411Y40/test-cpg-123456-genmatrix-3-cpg-54321.html>.

I'd do the MCMC bits too, but it doesn't seem like the code's really ready, with hardcoded settings like this https://github.com/petrelharp/context/blob/master/pairplots-mcmc.R#L21.

If you extract that functionality out of the scripts I'd be happy to plug it into the .Rmd.

— Reply to this email directly or view it on GitHub https://github.com/petrelharp/context/issues/23#issuecomment-57401055.

— Reply to this email directly or view it on GitHub https://github.com/petrelharp/context/issues/23#issuecomment-57529974.

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/

matsen commented 9 years ago

As far as I'm concerned, @petrelharp finished this off with https://github.com/petrelharp/context/commit/9c6a4eacf9cefa4deeeae64d6fb3093e3e0ba709.