stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).
BSD 3-Clause "New" or "Revised" License
140 stars 44 forks source link

[Fix] return NaN values for `write_array` #1165

Closed SteveBronder closed 2 years ago

SteveBronder commented 2 years ago

This is an alt to #1163 that I think is a little simpler. Here instead of having to check write_array_impl we can just take the vector we are outputting to and pre-fill it with NaN values. For the example model in https://github.com/stan-dev/stan/issues/3114 this gives back NaN values for the non-parameter values from the model like the below.

lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta,theta2,thetaimp
-7.19964,0.924465,0.840426,2,3,0,9.04866,0.145327,nan,nan
-7.45567,0.859583,0.840426,1,3,0,9.08732,0.414011,nan,nan

This also has what I think is a fix for write_array where for size 0 output vectors we initialize them with the correct number of cells needed to write the resulting values. When write_array receives a non-zero sized vector output vector we assume it at least has capacity for the values we want to write and then write those output values to the tail of the output vector

Submission Checklist

Release notes

Bug fix for output values in write_array that should be NaN

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

nhuurre commented 2 years ago

When write_array receives a non-zero sized vector output vector we assume it at least has capacity for the values we want to write and then write those output values to the tail of the output vector

Is that the intended behaviour? The doc comment in model_base.hpp says

The output parameter sequence will be resized if necessary to match the number of constrained scalar parameters.

SteveBronder commented 2 years ago

Oh yeah no what no its not the intended behavior. I thought for some reason there was code somewhere that wanted this but I think I'm wrong. Going to clean that part up so it just does the resize

SteveBronder commented 2 years ago

resized if necessary

Actualy re-reading those docs I think that just change the if to if (vars.size() < num_to_write) and do the resize if that comes out to true. Then we get the behavior that we want + can do this little nice thing where if we already have some stuff in that output vector we will just write to the end. The main reason that would be nice is for when we want to write our row of stuff to the output csv it would be nice to just write the sampler values and the model values all to one vector where right now we have to write them to separate vectors and then merge them to one vector later when we write them to the output csv.

WardBrian commented 2 years ago

I just wanted to write in that I think we should have a C++-focused developer like @bob-carpenter review this. The only OCaml changes are replacing a deprecated function call, which is good in my book.

bob-carpenter commented 2 years ago

OK. I should be able to get to this today.

bob-carpenter commented 2 years ago

To review this, I'm going to need help from @WardBrian or @SteveBronder understanding what all the OCaml is doing. I can't quite parse out the C++ being generated.

WardBrian commented 2 years ago

The reviewable changes are in src/stan_math_backend/Stan_math_code_gen.ml. In particular, the code generated for this method is pretty much the same regardless of what the model looks like, so it is stored as a string of C++ with a few template strings (%a in the string) which are filled in with integer sizes, I believe. The review is essentially of that C++ code, which can also be seen repeated in many test outputs

SteveBronder commented 2 years ago

@bob-carpenter I can come down to simons on Monday to review this irl if that's easier

bob-carpenter commented 2 years ago

@SteveBronder We need to talk about this. I am having a hard time figuring out your intent with this code. Can you clearly explain what the intent is of the new write_array method? It looks like this is changing behavior in some way, so I'd like a crisp definition of the intended behavior before reviewing. I'm happy to do this side by side if you're going to be around, otherwise we can set up a zoom call.

bob-carpenter commented 2 years ago

This PR needs an issue with a clear spec on what's being implemented for write_array before I review it.

I do not want to implement #3114 as specified. I would much rather implement NaN for all generated quantities to signal there is a problem.

SteveBronder commented 2 years ago

Bob I wrote an issue below, but yes as I was looking at this I realized it is a bit of a feature creep for this fix and it should probably have its own separate PR

https://github.com/stan-dev/stanc3/issues/1171

SteveBronder commented 2 years ago

@bob-carpenter @WardBrian just bumping this! I removed the extra stuff and now this just sets the default values of the array we are writing to to NA

WardBrian commented 2 years ago

@SteveBronder would you mind fixing the merge on this so CI can run?

SteveBronder commented 2 years ago

@WardBrian done!

rok-cesnovar commented 2 years ago

I think this PR has caused stan develop tests to fail: https://jenkins.flatironinstitute.org/blue/organizations/jenkins/Stan%2FStan/detail/develop/21/pipeline/

I am looking if there is any easy fix else I will revert so that we don't cause all the tests to fail in Math & Stan.

WardBrian commented 2 years ago

It seems like we have a test which is literally count commas in a contrived output? I'm not sure why that would have changed with this, but it seems like a bad test: https://github.com/stan-dev/stan/blob/a213ac25bbf489dbb8168d9a833b065d8db57f96/src/test/unit/services/util/gq_writer_test.cpp#L43

rok-cesnovar commented 2 years ago

That test checks that there are 3 values in the resulting CSV lines.

my current theory is that this PR changed the behaviour for standalon generrated quantities in that it also outputs the values of the parameters (inputs in the case of standalone GQ). Not sure yet though.

WardBrian commented 2 years ago

my current theory is that this PR changed the behaviour for standalon generrated quantities in that it also outputs the values of the parameters (inputs in the case of standalone GQ). Not sure yet though.

I don't think that's happening - I ran this cmdstanpy notebook with the latest builds and the output is the same so far as I can see