Closed tmerrittsmith closed 5 years ago
@tmerrittsmith yes!! This would be awesome! We definitely want to keep the spark one, but could have the pandas version as a separate notebook? Thanks!!
Cool. Yeah the spark one is definitely worth keeping - it's like the Ferrari version, where mine is the second hand hatchback...
I'll tidy up what I've got, but can provide two possibilities: 1) A straight conversion of the crowdsourcing tutorial (exactly the same data and results), just using pandas where spark was used (to be honest, I didn't use pandas that much apart from joining the csvs at the beginning)
2) Something more similar to the intro tutorial, but for tabular data. We wrote some labelling functions for a UCI repository dataset, and then used snorkel's generative model to resolve the labels.
What's the best way to get them to you, once they're ready?
Hah :) . Both of these sound awesome- whatever you can send, we'll look over! And best way would be a PR (separate ones for each). Thanks!!
@ajratner is this still on-going or existing in any PR? Thanks!!
Sorry, haven't got round to making the PR - I'll do that today.
On Wed, 16 Jan 2019, 23:26 Zhonglin Han <notifications@github.com wrote:
@ajratner https://github.com/ajratner is this still on-going or existing in any PR? Thanks!!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HazyResearch/snorkel/issues/1030#issuecomment-454983085, or mute the thread https://github.com/notifications/unsubscribe-auth/ALnqAlrImJ6IyvEAG1Q3RogyCs7YgBsvks5vD7UHgaJpZM4XtAlq .
@tmerrittsmith awesome thanks!!
I've submitted a pull request to include a notebook where the spark dependency is removed. The other one (tabular data) is not really quite how I want it: @hanzlfs did you specifically want to look at that, or just see the version where spark is removed?
@tmerrittsmith thanks! I think your PR is good enough to have insights.
@paroma assigned to you since you're looking at the PRs!
thank you for submitting this PR! (merged with #1048)
@tmerrittsmith Could you kindly update on the tabular data PR in case you have any leads? Thanks!
I've edited the crowdsourcing tutorial to use pandas instead of spark. It was useful for me, so may be useful to others. Do you want it?