Closed mrbeann closed 5 years ago
That's a really good idea!
Here's the first contribution:
def address():
import re
exp = "\d{1,4} [\w\s]{1,20}(?:street|st|avenue|ave|road|rd|highway|hwy|square|sq|trail|trl|drive|dr|court|ct|parkway|pkwy|circle|cir|boulevard|blvd)\W?(?=\s|$)"
regex = re.compile(exp, re.IGNORECASE)
@labeling_function()
def _address(x):
if regex.search(x["query"].lower()):
return ADDRESS
else:
return ABSTAIN
return _address
I think if you'd like to contribute cross-domain utilities, maybe make a PR and add it under snorkel/contrib
? Check out the README here.
Edit: @eggie5 great addition! I wonder how it compares with a custom-build library for address parsing like usaddress. Have you tried that out? Would be curious to see if it helps :)
Thanks for raising this! An LF Zoo is something we’ve been excited about for a while as well. It's on our roadmap, but we want to make sure it's done with proper organization, testing, etc. so that it remains clean, maintainable, and general. We would most likely host it in another repo, leaving the snorkel repo focused on the core functionality. We’ll be sure to post to the Snorkel mailing list once we have anything to announce!
Yeah, this ZOO is more difficult to build and needs elaborate designs. Here are a few things I can come up with now, hope it can help.
Hope this can be build to make snorkel more practical. And feel free to close this issue.
Hi @mrbeann @eggie5 @dataframing — I wanted loop around to share some scaffolding for the snorkel-zoo that you all alluded to earlier: https://github.com/snorkel-team/snorkel-zoo
Feel free to open a PR contributions in this repo — excited to see what LFs you've had in mind!
The LF is a fantastic idea, and it can be shared across different users, so I think a Function Zoo (similar to Model Zoo) will be very useful.