Augur DB: adding additional blade and monitoring tools

cdolfi commented 2 years ago

To be able to help with collection of data and data access, an additional blade is going to be integrated into the system. With that, we are going to be created monitoring tools to be able to understand the collection better. Others can add context to this as well

sgoggins commented 2 years ago

The tl;dr is this approach will :

Make more compute available to both Augur and Postgresql individually
Eliminate Augur and Postgresql from competing for the same resources
Isolating Postgresql from the server Augur runs on will reduce competition between Explorer users selecting data and Augur inserting data
Somewhat incidentally, though not insignificantly, we will store repositories on a local drive instead of an NFS mount, which will accelerate the pace of commit counting (and reduce process i/o wait times, which can slow down collection of other data as well).

sgoggins commented 2 years ago

@jasonbrooks : I should also mention that we are closing in on an alpha release of a new veresion of Augur that incorporates Josh B's generous feedback. As of right now, the CPU utilization on that new version is a fraction of the old version. Instead of pinning serveral CPU's, we are rarely exceeding 10% utilization on any one process.

sgoggins commented 2 years ago

As soon as we hit alpha, I’ll create a parallel instance of your data on my server. The punch list we are working through include:

Scale validation of our celery/Redis job queuing (in process and looking good)
A function to check for moved repositories (Until 2 weeks or so ago, GitHub simply redirected all the API calls to the new location. Now it doesn’t. This is a fairly simple, but necessary function. We may simply skip these repos on the alpha path, but it will be a quick fix in the next release)
2-5 additional cycles of collection at scale, which is really how we identify GitHub API changes, “weird” data anomalies [like the Github user NaN]

The speed increase is a byproduct of refactoring data collection, implementing celery/Redis instead of our job scheduler, and the use of SQLAlchemy’s “upsert” function when values like “opened”/“closed” are changed.

jasonbrooks commented 2 years ago

An update on this, we haven't deployed the second blade yet because we're having some issues with the blade chassis that will require someone at the data center to put their hands on it. We've got a ticket in for that.

sgoggins commented 2 years ago

@jasonbrooks : how is this process going? We are ready to get this setup. It will be difficult to take Aspen to the next level without getting the infrastructure aligned. Regarding Augur, I can confirm that that our parallelization is enabling linear scaling. More power, more better.

jasonbrooks commented 2 years ago

Hey Sean, we had some hardware issues that were slowing us down, but we provisioned another blade for use by augur. I'll msg you on slack about it.

sgoggins commented 1 year ago

@jasonbrooks : Looking to setup some time to talk this week.

JamesKunstle commented 1 year ago

Tracking deployment of Augur elsewhere- blade has been deployed, we have access to it.

oss-aspen / Rappel

Augur DB: adding additional blade and monitoring tools #170