Cleanup and optimization of scripts

snadi commented 4 years ago

Trying to speed up the current calculation of the metrics

snadi commented 4 years ago

Popularity
- After extensive trials, it seems the current strategy of querying repo by repo for each library is best. Each of these queries counts towards the limit so we can sleep after we reach the limit. If we have more repos in the query, then we need to check the individual results and github times you out when you try to access multiple pages. Thus, right now, just checking if results are returned for a repo/lib combination seems like the best strategy.
  - just made a few cleanup changes
Issues
- Current problems include sleeping for an hour whenever the rate is reached (not really needed... a minute should be enough but have to double check how to avoid the hourly rate too).
- we always go back and retrieve issues from the beginning of the history which is very costly. Changing to first check the latest issue stored in teh DB and just get info from there. Also, no need to store in a pkl file first then in DB. Can directly store in DB since all other info (frequency, response time etc) will need to be calculated based on all data in DB so we will have to wait till we got all new issues and then go through all issues in DB

snadi commented 4 years ago

All scripts have been tested on the server. Major updates:

make scripts work as a module because the relative script paths didn't work correctly on other computers (need to update python paths etc which isn't sustainable)
fix charts to actually show the values from the different metrics entries (E.g., statistics)
some cosmetic changes in charts
Issues and releases are directly stored into the DB now. When another run of the scripts happen, it will start retrieving issues from the date of the last issue. This speeds things up significantly. With releases, the query does not allow a since date so it will still fetch all releases but will skip ones already in the DB.

ualberta-smr / LibraryMetricScripts