Open slimsag opened 3 years ago
Heads up @joelkw @felixfbecker - the "team/code-insights" label was applied to this issue.
A few notes:
repo_id
is designed to be coherent in the primary postgres DB, even through repository renamesrepo_id
fetched from the primary DBGiven this, we can imagine a time series T
such that it started with repo name first
and eventually changed to second
| -----(first)------ | =====(second)===== |
In this case, all of the underlying data would still be associated with the same repo_id
, but the timeseries would map to multiple repo_name
entries. This is acceptable because when we query by repo_name
regex we match any insight series that contained the name, which would return the data that matched the original repo_name
.
With this in mind, we may be able to deprecate the original_repo_name
field entirely, without the need to perform updates on those entries.
Since we expect insights to be e.g. filtered down by repo name (or a regex over the repo name, like to find insights for a specific org), the DB stores three fields (dbschema):
repo_id
, which is the ID of the repo (irregardless of any renames) as known by the main app DB.repo_name
string, which the repo was named at the time the datapoint was recorded. We will use this to regexp search for data points withrepo_name
matching some regexp. The idea is that there would be a background worker which goes through the DB and asks the main app DB (viaRepoStore.GetByID()
) what the current name of therepo_id
is and updates this field retroactively, thus it is possible to query based on the current name of the repo (generally speaking.)original_repo_name
string, which is exactly the same asrepo_name
except that it will not be retroactively updated. This is useful because you might wish to see that e.g. an insight's data changed substantially as part of a major renaming effort that went on. In this case, some data points would show the old repo name and some data points would show the new repo name (becauseoriginal_repo_name
is the name of the repo at the time the data point was recorded)Everything described above is implemented, and all the fields described above are being recorded - but the background worker which updates
repo_name
to match the latest-known name for the repo is not:We should implement that, or if we don't care about repo renames ditch it and just have a single
repo_name
field.