Associating non_feature data with feature_key

phurwicz / hover

:speedboat: Label data at scale. Fun and precision included.

https://phurwicz.github.io/hover

MIT License

323 stars 19 forks source link

Associating non_feature data with feature_key #53

Closed robinsonkwame closed 1 year ago

robinsonkwame commented 2 years ago

There are often metadata associated with the feature data; for example, text comes from certain documents. After labeling the raw data it's often useful to merge the labels with the metadata for other data science tasks. For example, some sets of documents or locations might not contain a labels that you would otherwise expect them to. Or you want to aggregate counts by document or location.

Is there a way for SuperisableTextDataset to include non_feature data? non_feature data could store this kind of metadata. The subset row order differs from the raw data frame so you can't just match indices.

phurwicz commented 2 years ago

Yes, the dataset can have any extra columns as long as their names don't conflict with the columns that hover uses.

feature_key, label_key, "SUBSET"
Some specific functionalities use "pred_label"/"pred_score".
the plotting utility uses "__COLOR__", "__ALPHA__", etc.

Most of the time you simply won't hit a conflict, just pass your full csv to SupervisableTextDataset.from_pandas().

That said, we should be making it more obvious which columns will conflict and suggest the user to change them.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.