snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.82k stars 857 forks source link

Add preliminary spacy v3 support. #1702

Closed yinxiangshi closed 2 years ago

yinxiangshi commented 2 years ago

Description of proposed changes

Just add the new parameter of wapper to support spacy v3.

Related issue(s)

1701

Fixes # (issue) Preliminary spacy v3 support.

Need help on these? Just ask! Hallo, I just reviewed the wrapper of spacy v3, and add the parameter exclude, write the docs for changed parameters exclude, disable. I am not sure what other functions snorkel used from spacy, please help me to confirm other needed changes. Thanks.

rjurney commented 2 years ago

I'm back, I have not reviewed this PR but I DID survive my latest round of vaccines! Taking a look now...

yinxiangshi commented 2 years ago

@rjurney I haven't taken the boost dose vaccine, how do u feel Nah?

rjurney commented 2 years ago

In principal this looks like it should do it. I will run tox -e complex - no idea how long that takes to finish.

Oh man, I had three Pfizers and was like only mildly woozy on the first or second Pfizer. This was a fourth Moderna booster plus Pneumonia 23 and it was like the peak of the flu with 102 fever for three days. Intense. I hear a lot of people get that with Moderna on Twitter :)

yinxiangshi commented 2 years ago

@rjurney In, fact, I should add a config parameter in the wrapper(see docs of the newest preprocessor of Spacy: https://spacy.io/usage/processing-pipelines#disabling) However, I didn't find the usage of config parameter, plz help me to check out, thx!

3 days fever, take care of urself man, hope u stay healthy!

rjurney commented 2 years ago

@yinxiangshi ok, it passes tox -e complex :) So you can check that box up top of this page :D

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=========================================================== 9 passed, 242 deselected, 35 warnings in 205.79s (0:03:25) ===========================================================
____________________________________________________________________________________ summary _____________________________________________________________________________________
  complex: commands succeeded
  congratulations :)
yinxiangshi commented 2 years ago

@rjurney Well, thanks for your test. Do u have any idea about config parameter?

rjurney commented 2 years ago

@yinxiangshi I get what you mean now. The API shows how to use config files, I had assumed they were used be default. We can use Language.from_config(config):

from thinc.api import Config
from spacy.language import Language

config = Config().from_disk("./config.cfg")
nlp = Language.from_config(config)

and then build this into the API if a config argument is present.

yinxiangshi commented 2 years ago

@rjurney Thanks for your help with this issue, I will try to add it after my final week. Have a good night tho.

rjurney commented 2 years ago

@yinxiangshi I can take a crack at it!

yinxiangshi commented 2 years ago

@rjurney Ohh, I am looking forward to your work! Thanks!

rjurney commented 2 years ago

@yinxiangshi I see this code in snorkel.labeling.lf.nlp already exists, maybe we should use it? Reading all the code now...

class SpacyPreprocessorConfig(NamedTuple):
    """Tuple of SpacyPreprocessor and the parameters used to construct it."""

    nlp: SpacyPreprocessor
    parameters: SpacyPreprocessorParameters
yinxiangshi commented 2 years ago

@rjurney I had this idea before, my opinion is that we could use it, however, we should carefully consider which parameters will be accepted by config.

rjurney commented 2 years ago

@bhancock8 I know your resources aren't limitless, but it would be amazing to get this into a release.

bhancock8 commented 2 years ago

Hey @rjurney — it looked to me like this was still in development? Ping when you've resolved your discussion about the config, as well as confirmed that the new exclude parameter works as expected (I don't believe any existing or new tests cover that functionality?). And then just as a nit: I see a handful of places where "pipeline" is misspelled. Once that test plan is complete, let us know this is ready for review and we'll assign a reviewer.

rjurney commented 2 years ago

Hey @rjurney — it looked to me like this was still in development? Ping when you've resolved your discussion about the config, as well as confirmed that the new exclude parameter works as expected (I don't believe any existing or new tests cover that functionality?). And then just as a nit: I see a handful of places where "pipeline" is misspelled. Once that test plan is complete, let us know this is ready for review and we'll assign a reviewer.

@bhancock8 Is there still a chance this can go in the release of Snorkel?

rjurney commented 2 years ago

@bhancock8 sorry, I meant if I finish it like this week.

bhancock8 commented 2 years ago

Yep @rjurney — we'll be happy to take a look once you let us know that the PR is ready for review.

github-actions[bot] commented 2 years ago

This pull request is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.