Closed WardBrian closed 1 year ago
Merging #681 (7945536) into develop (107a347) will decrease coverage by
0.60%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## develop #681 +/- ##
===========================================
- Coverage 80.60% 80.01% -0.60%
===========================================
Files 72 72
Lines 11292 10951 -341
===========================================
- Hits 9102 8762 -340
+ Misses 2190 2189 -1
see 40 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@mitzimorris - I think this is ready to review
It is probably hard to review without simultaneously "reviewing" stanio
The breaking change is also the different return type of stan_variable for optimize and variational
I took a look at the FBProphet code, and it doesn't use stan_variable
- although it runs optimization, it's pulling things back as a np array.
I'm willing to go with a breaking change.
the other place to check is ArviZ. checking now
I don't think arviz touches anything other than CmdStanMCMC
regarding moving stanio under stan-dev - should we create repo stanio with subdirs according to language -
python
, r
, c++
?
I don't know enough about the R ecosystem to know if they need this - I think rvars/posterior
cover a lot of the existing use-cases, but of course tuples change things.
C++ I think it would be very difficult to write something like this, since the return type of the extracting functions depends on the data
I think it's OK to make this a repo under standev just for the python io used by CmdStanPy and BridgeStan and future Python packages.
Would it make sense to add a dtype
argument somewhere in the data loading, e.g., as an extra argument to CmdStanModel.{sample,optimize,...}
to specify the dtype
returned by stan_variable
? This could be useful for reducing the memory footprint of larger models. For example, to evaluate WAIC for the election88
dataset of CBS News polls, the log_lik
variable has num_draws_sampling * chains * 11565
elements. Using cmdstanpy defaults that amounts to ~ 740 MB or ~ 1.5 GB if we include the logits for the binary outcome. Not enormous, but my laptop wasn't too happy when I compared several models.
This may not be the right thread, but thought I'd mention it in case it affects the data munging.
@tillahoffmann I think that's worth its own issue. It is sort of orthogonal to this issue, since the transformations we apply here are agnostic to what the dtype of the draws are. If we changed what we were reading it from the files, these should propagate nicely
Submission Checklist
Summary
IO changes for 2.33 (tuple support).
InferenceMetadata
changed (not technically part of our public API, but it was mentioned some places in our docs).stan_variable
function now returns the varational draws, not the mean approximation (same as #652).stan_variable
, never just a Python floatCopyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Simons Foundation
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: