intel / p3-analysis-library

A library simplifying the collection and interpretation of P3 data.
https://intel.github.io/p3-analysis-library/
MIT License
7 stars 10 forks source link

Skip null values during projection #62

Closed Pennycook closed 3 weeks ago

Pennycook commented 1 month ago

Related issues

None.

Proposed changes

Pennycook commented 1 month ago

I'm fine with the implementation, but as a general question, is this behavior desirable?

i.e. when I'm parsing a set of experiments, would you prefer to show nothing at all, or for consistency's sake, show something like a-_-c instead?

Good question. In my own experience, I've only ever found the presence of "None" and "NaN" values to be a nuisance. Working around this without library support typically requires you to do the projection separately for a bunch of different things and then concatenate the results -- it's possible, but difficult to generalize.

One option here would be to add a configuration option like skipna=True to the projection interface. But I'm not sure how intuitive such behavior would be, or what people would expect the default value of the option to be... Replacing everything with _ might also be surprising.

Another option (which I think I prefer) would be to document that we skip null values, and to encourage (via an example) that developers should use functionality like pd.fillna if they don't like our default behavior. I think that a user could get the behavior you suggested by writing:

proj = p3.data.projection(df.fillna(value="_"), platform=[...], application=[...], problem=[...])

...which is still pretty readable, and doesn't require any new functionality on our part. It also enables users to get really clever if they want to (because fillna lets you do complex things like provide different fill values for different columns).