ukri-excalibur / excalibur-tests

Performance benchmarks and regression tests for the ExCALIBUR project
https://ukri-excalibur.github.io/excalibur-tests/
Apache License 2.0
18 stars 15 forks source link

NaN filtering #253

Open kaanolgu opened 9 months ago

kaanolgu commented 9 months ago

Can NaN-type values be filtered out by our existing filtering capabilities @pineapple-cat ? If not, we'll need it.

pineapple-cat commented 9 months ago

I haven't tested whether a filter like ["col_name", "!=", null] would work as expected, but what you're asking for is not currently intentionally implemented. I've made a note of this before though, and it would be quite simple to add a flag to the config to signify that any rows with NaN values should be dropped. We could situate it like this:

y_axis:
  value: "y_axis_col"
  units:
    column: "unit_col"
  drop_nan: True

And then add something like this to the post-processing code:

if config["y_axis"].get("drop_nan"):
  df.dropna(subset=[config["y_axis"]["value"]], inplace=True)

We could also use this to drop data that has been scaled with itself if we add this line right after scaling in transform_axis:

df[axis["value"]].replace(to_replace=1, value=np.NaN, inplace=True)
ilectra commented 8 months ago

The method to drop them looks fine, but I think notionally it makes sense more as a filter than a property of the y-values. Maybe we could have some pre-set filters, like the flag you suggested, for ease of use? What do you think @kaanolgu , as our guinea pig user? :) Also, a flag to drop the self-scaled value could be useful, but I wouldn't like to entangle it with the drop_nan flag.

ilectra commented 7 months ago

I take the above back. Let's treat it as a property of the y-values, set at the config file, for now. If we get more cases for "pre-set filters" in the future, then we move it to the filters.