snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Using labeling function for producing labels of numerical data. #1505

Closed jennakwon06 closed 4 years ago

jennakwon06 commented 4 years ago

I was wondering if Snorkle could be used for labeling numerical data (specifically what I have is time-series).

For example, I have a Numpy array: [100, 100, 20, 20, 20, 90, 100, 100, 100, 100], representing 10 values emitted from 10 different times.

My labeling function would be: "If value is < 30 for more than or equal to 3 consecutive times, then label the times as 0"

The output I am looking for (or anything I can use to translate to the below) would be, [0, 1] : 1 [2, 3, 4] : 0

Thanks!

Best, Jenna

paroma commented 4 years ago

You can use Snorkel over numerical data - we have an example in this image-based tutorial that uses features of the bounding boxes of objects in the images. For time series data specifically, we published a paper in NeurIPS '19 that describes a new generative model that can handle multi resolution LFs.

For the use case in the description, you can write labeling functions that operate over the particular datapoint and its neighboring datapoints to assign a label.Y our datapoints could have fields that refer to the previous and next values in the data, which you can access from the LF:

x.val = 3
x.prev_val = 2 # or None
x.next_val = 4 #or None
github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

ajratner commented 4 years ago

Closing for now- feel free to re-open!