snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Spacy v3 support #1701

Closed yinxiangshi closed 2 years ago

yinxiangshi commented 2 years ago

Just want to re-ask about #1621 I have noticed that snorkel still doesn't support spacy v3. I can help with this, but I think the info probably needs to update. So right now for snorkel, which files use spacy?

cmglaze commented 2 years ago

Hi @yinxiangshi, the same set of files use spacy as before: https://github.com/snorkel-team/snorkel/issues/1621#issuecomment-779496657

yinxiangshi commented 2 years ago

Oh, thank you.

rjurney commented 2 years ago

@henryre is there any way to stop certain tickets from getting cleaned every so often? A valid definition of a feature is open source product management and you’re obfuscating even the ones I’m sure you want.

yinxiangshi commented 2 years ago

@rjurney Hello, I've added preliminary spacy v3 support in pr. Could you please help review the changes? I am not that sure wrapper still needs changes. Thanks.

rjurney commented 2 years ago

@yinxiangshi yes, it so happens I just left my job so I have time and need something to do :) Let me just set up my computer and I’ll get on it!

yinxiangshi commented 2 years ago

@rjurney In my perspective, I think we just need to add new parameters of _nlp in the preprocessor. I didn't see any other needed changes I think. Plz help me! Thanks!

rjurney commented 2 years ago

@yinxiangshi Got my new computer setup, looking now!

yinxiangshi commented 2 years ago

@rjurney This is what I am looking forward :) Cong about your new life & new computer!

rjurney commented 2 years ago

@yinxiangshi the second booster wiped me out like moments after I posted that... here goes again!

rjurney commented 2 years ago

@yinxiangshi I guess we are using #1621 now?

rjurney commented 2 years ago

Sorry for the dupe comment, but:

@yinxiangshi I got tox -e complex to run. I am looking over the relevant files to see if there are anything we missed. I didn't quite get your comments about config - I am not sure how that changes things, unless we want to add spaCy config support to snorkel. I suppose that is reasonable, let me look!