Closed varun-tandon closed 7 years ago
Fixed by deleting snorkel.db
Thanks Varun! A little more detail in case anyone else encounters this. A lot of the label loading code is nondestructive (since labeled data is so valuable!), so it won't replace existing labels in the database. The issue was that '-1' labels were already stored from a previous run.
Hi Snorkel Team,
I am one of the interns at the Canary Center working with Gautam and Dr. Mallick on MarkerVille, and am facing some issues with loading gold labels. We discussed debugging methods at the last OH with @stephenbach and we are facing a strange issue where our gold labels contain the StableLabel for a particular extracted candidate; however, Snorkel labels this candidate as -1 and appears to not match the gold label and extracted candidate.
We are loading external annotations (which are already formatted as StableLabels and in a TSV file) in a manner similar to the intro tutorial using the same utils.py file, like so:
from util import load_external_labels
%time load_external_labels(session, BiomarkerCondition, annotator_name='gold')
and we then load the gold labels like so:
from snorkel.annotations import load_gold_labels
L_gold_dev = load_gold_labels(session, annotator_name='gold', split=1)
print L_gold_dev
This provides an output of a numpy array with all elements labelled as -1.
Viewing an individual candidate like so:
print L_gold_dev.get_candidate(session,x)[0].get_stable_id()
print L_gold_dev.get_candidate(session,x)[1].get_stable_id()
Provides the following output:
28262798::span:613:617
28262798::span:632:643
And we have confirmed that these StableLabels exist in our external annotations and are labelled as positive 1.
We would greatly appreciate any help you could provide us regarding why Snorkel does not seem to match the external gold label with the extracted candidate and/or any further debugging steps we should take. Please let me know if any more information is needed!
Thanks!