Closed beingkk closed 1 month ago
Thank you @RFOxbury, I'll add your suggestions in the documentation. I'm slowly writing Wiki entries for the repo, so that's probably a good place (or maybe for Crunchbase we'd do a private google doc, because it's proprietary dataset)
I'll go ahead and merge, thanks!
Fixes #44
Modifies (rewrites) discovery_utils.getters.crunchbase
Adds CrunchbaseGetter class for easy and convenient fetching of the different Crunchbase data tables.
Instructions to the reviewers
Hey Rosie @RFOxbury, would appreciate if you can take a look at this PR and see that it makes sense! It's largely lifted from innovation_sweet_spots repo, so should be OK, but good to have another pair of eyes.
Hey Shabeer @shabrf and Solomon @sqr00t , no need to review this PR for you - just wanted to ask your opinion on how you would write tests for such a getter class. For example, I guess we could mock up an S3 bucket using LocalStack to check the logic... and then do the actual test of fetching the data from the real S3 bucket and running some data integrity tests? Just curious if you have seen something
Another question where I'd like your opinion - if I would like to add some utility functions that are not simply getters but say help query the Crunchbase data (eg, "find me all companies belonging to Crunchbase category X"). Do you think it makes sense to still add such a utility function in getters, or should we have a different category of utils for that? (eg, "analysis utils")
Ohh, and I had to downgrade Python version to 3.11 because pflake8 was giving an error - I guess some incompatibility with Python 3.12, not sure why it suddenly appeared!