snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Q&A: Query regarding "data" in Image tutorial #1002

Closed velu1122 closed 5 years ago

velu1122 commented 6 years ago

Hi, This query regarding the Image e.g. "Person Riding Bike" data folder. Here, "train_ground.npy" (loader.train_ground) has base data of whether a person riding bike or not. But I couldn't find the corresponding images from where the "train_ground.npy" got created. I compared against "image_data.json", but I couldn't relate json (approx. 108,077 images) and train_ground (903 images) files. [in train_ground, the first value is 'true' but in json file, the first image doesn't have person riding bike.] Could you please help in explaining from which images(903 images) "train_object_names", "train_object_x" etc. got created.

I am trying to execute Snorkel image tutorial with my own images and LFs. Could you please provide information regarding this. Thanks.

paroma commented 5 years ago

We used labels and bounding box attributes from the Visual Genome dataset. The raw images we used are in the data/image_data.json file as a link to the images in Visual Genome.

We have also had success using existing object detectors like this to generate bounding box and category labels and writing labeling functions over that.