microsoft / ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...
MIT License
373 stars 90 forks source link

Adding support for Google Cloud Storage as a storage provider #131

Closed gauntface closed 6 years ago

gauntface commented 6 years ago

After talking with @jeffmcaffer a little last week this adds support for Google Cloud Storage to GHCrawler (Both as a storage provider and a delta provider).

I've tried to mimic the azure storage model as much as possible, couple of things I'm not sure of:

  1. I couldn't get the delta options to get passed from initializeSubsystemOptions() in the crawlerFactory. The CRAWLER_DELTA_PROVIDER value was being picked up by the getDefaultOptions() method, but would get filtered out by initializeSubsystemOptions(). Commenting out the early return fixes this but I'm sure I'm missing the purpose of that if statement.
  2. I've used regular promises instead of the Q library promises, is this ok?
  3. The approach taken in the delta store is to write the contents to unique files in google cloud storage as I don't believe this is an efficient way to append to a file (which I think there is in the azure storage module).

Happy to change any of the approaches, code style etc, just let me know where the issues are :)

msftclas commented 6 years ago

CLA assistant check
All CLA requirements met.

gauntface commented 6 years ago

@jeffmcaffer thanks for the review comments.

I've switched everything to Q where I was originally creating native promises. I haven't gone to the extent of wrapping the promises returned by the Google Cloud Storage package, let me know if you would prefer it though.

Added some tests for the new delta store - I didn't see a coverage reporting anywhere, I've just been running npm run test to check everything is WAI.

Let me know if you'd like any other changes / tests / etc.

Regarding testing - I am running this on a container to make sure it's crawling and working as I'd expect so far, but obviously doesn't rule out stupid mistakes on my part, so feel free to question any of this PR.

gauntface commented 6 years ago

Gentle nudge @jeffmcaffer :)