sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.27k forks source link

insights: retroactively update repo names in TimescaleDB #19196

Open slimsag opened 3 years ago

slimsag commented 3 years ago

Since we expect insights to be e.g. filtered down by repo name (or a regex over the repo name, like to find insights for a specific org), the DB stores three fields (dbschema):

Everything described above is implemented, and all the fields described above are being recorded - but the background worker which updates repo_name to match the latest-known name for the repo is not:

The idea is that there would be a background worker which goes through the DB and asks the main app DB (via RepoStore.GetByID()) what the current name of the repo_id is and updates this field retroactively, thus it is possible to query based on the current name of the repo (generally speaking.)

We should implement that, or if we don't care about repo renames ditch it and just have a single repo_name field.

github-actions[bot] commented 3 years ago

Heads up @joelkw @felixfbecker - the "team/code-insights" label was applied to this issue.

coury-clark commented 3 years ago

A few notes:

  1. repo_id is designed to be coherent in the primary postgres DB, even through repository renames
  2. Insight data is stored and associated with the repo_id fetched from the primary DB

Given this, we can imagine a time series T such that it started with repo name first and eventually changed to second | -----(first)------ | =====(second)===== | In this case, all of the underlying data would still be associated with the same repo_id, but the timeseries would map to multiple repo_name entries. This is acceptable because when we query by repo_name regex we match any insight series that contained the name, which would return the data that matched the original repo_name.

With this in mind, we may be able to deprecate the original_repo_name field entirely, without the need to perform updates on those entries.