microsoft / ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...
MIT License
373 stars 90 forks source link

MongoDB - index commit _metadata.url and _metadata.links.self.href #135

Closed grooverdan closed 6 years ago

grooverdan commented 6 years ago

Frequent access via these fields can lead to slow and cpu intensive access particularly on large repositories.

Creating the index with the access methods may seem odd however its a noop if the index occurs according to Stack Overflow and mongo documentation. It also help migration.

closes #132 closes #133

grooverdan commented 6 years ago

thanks @jeffmcaffer

jeffmcaffer commented 6 years ago

No, thanks to you @grooverdan . sorry it took so long. As you will have seen, for the other PRs we need some other expertise to weigh in.

I'm interested to hear what you are using the crawler for. Perhaps you can ping me using First.Last@microsoft.com if you're willing.