Closed msank00 closed 5 years ago
Split is to create the training and test splits. Split zero is used for training and split one for testing. For extracting multiple relations you probably need to look into snorkel metal , the multi task version of snorkel
Glad you're enjoying the project, @msank00! We keep a collection of links to relevant papers on our landing page for the Snorkel projet: snorkel.stanford.edu. There you can find examples of extensions to Snorkel, follow-on projects, a new formulation of the generative model and support for multi-task learning/supervision in the Snorkel MeTaL project (https://github.com/HazyResearch/metal), etc. As for your other questions:
Q1: @vumaasha is correct: the split numbers are used to designate different tests (e.g., 0=train, 1=dev, 2=test). Q2: Yes, the database behind the SQLAlchemy layer can support multiple candidate_subclass types. We don't have any examples of doing that here in the main Snorkel repo. If you're interested in just having multiple pipelines, then you'll want to create a separate LabelModel for each individual task (and its corresponding LFs). If you believe the tasks have the potential to share information in a productive way, then as @vumaasha suggested, take a look at the Snorkel MeTaL extension, where we include support for labeling functions that weakly supervise multiple related tasks and multi-task end models.
Thanks for your reply @bhancock8 I looked into Snorkel MeTal, but there I couldn't find any example related to text annotation.
Could you please let me know if there is any Snorkel MeTal example/ tutorial available related to text dataset? I have also raised an issue regarding that in the Snorkel MeTal repo.
Thanks in advance.
An NER tutorial is generally on our roadmap—feel free to propose a general dataset or push a PR if you'd like to take a stab at this!
Yep, just to echo @vincentschen, we don't currently have an entity tagging tutorial (here or in MeTaL), but there wouldn't be anything fundamentally different about that setting. Your "candidates" that your classifying would now be individual tokens or spans and your labeling functions would weakly label those.
@msank00 @vumaasha — looping back around here, wanted to plug our new community forum, where there's an ongoing discussion about a potential NER tutorial! https://spectrum.chat/snorkel/tutorials/snorkel-tutorial-for-ner~34d56436-41e6-4216-a05b-59cf391a7fb3
Feel free to leave any thoughts/suggestions!
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Hi Team,
Really appreciate this great project. I would like to know what kind of techniques / label functions I should be looking at to potentially
annotate text data-set using Snorkel
. The final end extraction model should be able to annotate some specific information likedate
,reference number
,person name
,organization name
etc. In the Intro notebook, only one relationSpouse
has been extracted. But can we domultiple relation/ entity extraction from the same document
Any relevant research papers / pointers to articles would be really helpful. If anyone in community has already tried doing this and would like to shed some light upon the overall approach that would be very helpful.
Presently I am running the example notebooks to understand the workflow. I have following questions related to the examples:
Q1. What is the purpose of the following piece of code
Spouse.split == 2
, I found this in the 2nd Intro Notebook. what doessplit
do ? Q2. Can multiplecandidate_subclass
be used for same document?