Replace all BBI_NULL_NUM with NA

seth127 commented 2 years ago

bbi passes back -999999999 (defined here) as a "default" in many cases. Presumably this was done based on some NONMEM convention, but in most (all?) cases it seems to correspond what would more intuitively be NA in R.

The proposal here would be to have model_summary() (and any other relevant functions that parse bbi output) to replace BBI_NULL_NUM and BBI_NULL_STR with NA_real_ and NA_character_ respectively, at parse time.

Investigations, we should do before making this change:

Look through bbi to see if we can find cases where these values are returned that don't make sense to map to NA in R.
Consult some users on whether -999999999 actually would be an expected/desirable value to get in R. In my experience talking with users, they either are confused when they see -999999999, or they are used to seeing it and assume it means NA.

seth127 commented 2 years ago

You can see three examples in our model_summary() test reference object: here, here, and here.

Probably unrelated: the first two of those are run_start and mod_file and it's not clear to me why bbi can't parse these things in this case. That may be worth looking into, but again I think that's a separate issue.

kyleam commented 2 years ago

seth127 commented 2 years ago

@callistosp @timwaterhouse @kylebaron @janelleh @sonokokawa @curtisKJ any thoughts on this? Have you noticed the -999999999 coming through before? If so, is that desirable in any cases you can think of, or would you rather have NA?

callistosp commented 2 years ago

I'll agree with the point in the first comment that I just implicitly read it as an NA. Some people may have written some code that checks for this value (e.g. checking if covariance step failed), but that should be easy for them to fix on their end. I think it would be desirable to replace with NA when we bring it into R.

janelleh commented 2 years ago

@seth127 just FYI, as far as I know values like -99999 are used in place of NA for nmtran datasets because NONMEM reads blanks, periods and 0's all the same - as a 0. So typically at the end of a data assembly I'll make sure there are no NA's in columns that NONMEM is using, because they may be misused as 0 -- i.e. if someone had a missing weight, and their weight record was left blank, it would incorrectly be modeled as a 0 rather than missing. We can easily model around or impute missing values in the control streams using a flag like -99999.

But apart from the dataset, are you saying that bbr shows -9999999 for any NA in the model_summary, or any other function output? I haven't noticed that yet actually.

kylebaron commented 2 years ago

Agree with @callistosp - I read it as NA. I could go either way on it but wonder if there is some value in consistency in outputs? Just wondering ... no strong opinion here.

kylebaron commented 2 years ago

@callistosp showed an example offline where IMO it would be better to have NA.

janelleh commented 2 years ago

Re: our slack convo, yes I always read it as NA, but that's because of experience and familiarity with NM output... I think it might be more interpretable if it were printed as NA. Just have to make sure that there isn't some crazy case where -9e9 is actually the calculated value and it's replaced with NA accidentally, but I don't even know if that's possible to land exactly on that value

seth127 commented 2 years ago

Ok, thanks for the input folks. We'll think on this and poke around in the bbi code a bit but it sounds like, unless we find some weird cases that we hadn't considered, everyone is on board with this change.

seth127 commented 2 years ago

@dpastoor do you have any thoughts on this? Specifically I'm wondering:

Do you have any context on why this change (that introduces these -999999999 values) was made back in 2019?
Can you think of any situations where we wouldn't want to map these to NA on the bbr side?
Related to the previous point, are these situations where -999999999 is actually coming from NONMEM and just being passed through, and would any of those be things that we wouldn't want to map to NA (again, in R)?

Thanks.

dpastoor commented 2 years ago

the key change back in 2019 was to disambiguate when a 0 was returned because it was a zero nonmem actually set vs a default "zero-y"/"falsey" value in go, so really we could have used any number, I had just told David pick a number that wouldn't be something that could also be a potential actual value (eg -1 or something) so I think that was just picked (which also should answer bullet 3).

I can't think any reason not to do a pass to reset those to NA's in BBR, other than the extremely low overhead of coercion, though I would try to consider if there are other places that the json blob could come back as an NA otherwise somewhere - don't want to be in the situation of having an NA and saying well was it an NA that came from our coercion from -99999 or not. I cannot think of a place other than maybe a placeholder value, but I think we did a pretty good purge of those so I hope none are around anymore.

personally, I approve of the ergonomics of this potential switch 👍

seth127 commented 2 years ago

Thanks for the input @dpastoor. Very helpful context. In regards to:

I would try to consider if there are other places that the json blob could come back as an NA otherwise somewhere...

We stumbled on something like that recently and @kyleam fixed it [here(https://github.com/metrumresearchgroup/bbi/pull/268). Essentially, when NONMEM returned an OFV of NaN the bbi nonmem summary --json was crumping because it didn't know how to encode it. I think that PR ^ hopefully handled any other cases where bbi would be trying to pass through something like NA into the --json output. It instead passes the -999999999 value (which was what brought this issue to the forefront in the first place...).

Thanks again and I think we're likely to move forward with this at some point, maybe soon.

metrumresearchgroup / bbr

Replace all BBI_NULL_NUM with NA #520