mjhajharia / transforms

2 stars 1 forks source link

Feature/fix corr cholesky logdet #29

Closed adamhaber closed 1 year ago

adamhaber commented 2 years ago

This is a follow-up on #27, implementing two main changes:

  1. For the Stan transform, add the logdet correction due to the elementwise tanh transform (see comment in the original PR).
  2. For the TFP version, fix an indexing bug in the logdet computation.

I've also verified that we get the same results as in TFP. Interestingly, they fill the lower triangular part of the matrix in a different order, so the same unconstrained vector won't be mapped to the same Cholesky factor; but other than that everything is the same.

@bob-carpenter if this seems fine I can start running the evaluations and update the relevant part in the paper.

adamhaber commented 2 years ago

@mjhajharia regarding evaluating the different transforms - should I run things locally, or do you prefer we'll run everything on the cluster?

bob-carpenter commented 2 years ago

Feel free to run wherever. We should talk about whether we want the final product to be reproducible. I was hoping to target Computo, but this is turning into a real serious paper, so maybe something like Bayesian Analysis would be better?

mjhajharia commented 2 years ago

locally is fine for experimenting, once we decide to add something to the paper it would be nice to run everything 100 times on the cluster and use the normalized data for the plots, if you've uploaded a stan file in the repository I'll just put it on the cluster, i just have to plug in the filename in the slurm script for the cluster

sethaxen commented 2 years ago

We should talk about whether we want the final product to be reproducible.

What would reproducibility mean in this context? My understanding was that by specifying a seed, the computation can be made reproducible on the same machine, but minute differences between compiler settings and architectures can end reproducibility when moving to another machine. But I may be wrong.

I was hoping to target Computo, but this is turning into a real serious paper, so maybe something like Bayesian Analysis would be better?

I don't know the journal landscape in this field well enough to comment.

locally is fine for experimenting, once we decide to add something to the paper it would be nice to run everything 100 times on the cluster and use the normalized data for the plots,

Agreed, if we include also ESS/second (in addition to ESS/nsteps) as suggested in #21, we'll want those results for a given constraint to all come from the same machine.

bob-carpenter commented 2 years ago

Computo is a brand-new journal aimed at reproducible research. What they mean by this is that everything's runnable from a notebook or markdown, not that it reproduces bitwise across platforms (Java learned the heard way that doesn't work with performant arithmetic).

The term "reproducible" refers to the kind of reproducibility we're talking about here---you can run our code on your machine and get effectively the same answer. The term "replicable" usually means reproducing the results with different data. We're not dealing with data.

adamhaber commented 2 years ago

Fixed the merge conflicts, I think this is good to go if this looks good to you.

By the way, I'm not sure what caused them - @mjhajharia maybe it has something to do with the force-push? I've never used it before, so I'm not sure what it might cause...

bob-carpenter commented 2 years ago

We shouldn't force push anything to this repo! You can pull and then switch on your side and normal commit if you want the same behavior.

Overall, I'd rather not use git commands that rewrite history unless someone has inadvertently exposed passwords or other security issues.