Closed Pennycook closed 3 weeks ago
I'm fine with the implementation, but as a general question, is this behavior desirable?
i.e. when I'm parsing a set of experiments, would you prefer to show nothing at all, or for consistency's sake, show something like
a-_-c
instead?
Good question. In my own experience, I've only ever found the presence of "None" and "NaN" values to be a nuisance. Working around this without library support typically requires you to do the projection separately for a bunch of different things and then concatenate the results -- it's possible, but difficult to generalize.
One option here would be to add a configuration option like skipna=True
to the projection interface. But I'm not sure how intuitive such behavior would be, or what people would expect the default value of the option to be... Replacing everything with _
might also be surprising.
Another option (which I think I prefer) would be to document that we skip null values, and to encourage (via an example) that developers should use functionality like pd.fillna
if they don't like our default behavior. I think that a user could get the behavior you suggested by writing:
proj = p3.data.projection(df.fillna(value="_"), platform=[...], application=[...], problem=[...])
...which is still pretty readable, and doesn't require any new functionality on our part. It also enables users to get really clever if they want to (because fillna
lets you do complex things like provide different fill values for different columns).
Related issues
None.
Proposed changes