implement `is.na` and `is.nan` as vectorized operations mimicing R behavior

nimble-dev / nimble

The base NIMBLE package for R

http://R-nimble.org

BSD 3-Clause "New" or "Revised" License

156 stars 23 forks source link

implement `is.na` and `is.nan` as vectorized operations mimicing R behavior #1394

Closed paciorek closed 8 months ago

paciorek commented 8 months ago

WIP - still needs to look at testing for the newly vectorized behavior.

This implements is.na{,n} as vectorized operations that actually mimic their R counterparts instead of amounting to any(is.na(x)). This also does various cleanups of related code, including the any_na{,n} DSL keywords and the C++ code standing behind them.

This fixes longstanding issue #987. Not sure how we ended up with is.na{,n} acting as reduction operators but git blame suggests it was probably my fault.

A couple minor open questions:

I think we want is.na{,n} to return bool. R_IsNA and R_isnancpp return int, so I created nim_IsNA and nim_isnancpp as wrappers to cast the output.
The various C++ functions standing behind the DSL keywords are in nimDists.cpp for historical reasons. Would like input from @perrydv on a better location for them.

In general a look-over from @perrydv will be useful.

paciorek commented 8 months ago

Ok, I've incorporated a bunch of changes to my initial PR. The main ones are that is.na uses ISNAN to detect either NA or NaN, while is.nan uses R_IsNaN to detect only NaN, in keeping with R's is.na{,n}.

I also cleaned up NA/NaN detection in dists.cpp.

I enhanced testing for is.na{,n} and any_na{,n}.

I think this is fine to merge once check testing and determine whether to move function definitions from nimDists.{cpp,h}.

paciorek commented 8 months ago

So the changes to how NA/NaN are handled in distributions in dists.cpp caused the categorical sampler to break a bit. The issue is that in current nimble if there are NA/NaN values in the prob vector for rcat, we generate a 1. I've changed it so that in general our r functions now return NaN (in some cases NA for technical reasons not worth going into) when a parameter contains NA or NaN.

So I needed to add an is.na check to the categorical sampler. Things should be fine now but, @danielturek, just bringing this to your attention.

perrydv commented 8 months ago

@paciorek

I think if(!is.na(newValue) & newValue != currentValue) should use && instead of &.

I vaguely remember there finicky issues of all behavior cases so I hope there aren't surprises lurking. But what you've done does make sense so LGTM.

paciorek commented 8 months ago

I don't follow the comment about &&. I thought that in DSL code we always use the single version & (and |).

paciorek commented 8 months ago

Oh, also can you weigh in on:

The various C++ functions standing behind the DSL keywords are in nimDists.cpp for historical reasons. Would like input from @perrydv on a better location for them.

perrydv commented 8 months ago

@paciorek I will look at the & issue and nimDists organization question. Another idea for testing would be to make sure the vectorized versions work in vectorized arithmetic, like Avec + is.nan(Bvec).