simplesurance / baur

An incremental task runner for mono repositories.
GNU General Public License v2.0
362 stars 11 forks source link

Create "db export" and "db import" commands #302

Open kostyay opened 3 years ago

kostyay commented 3 years ago

We have an open source project that uses baur (https://github.com/stackpulse/steps). Our entire build system is based on baur. Currently our CI is configured not to run forked builds. This is because we don't want to let everyone access our postgres database which contains baur data. Ideally I want to build forked branches too, but I don't want to let users connect to my postgres db. My idea to solve this is as follows:

  1. When master branch builds one of the steps will be to export baurs db to a public S3 bucket.
  2. A forked build will set up postgres inside a docker container and import the database dump into the temporary database. This way the build will still enjoy the benefits of baur (only changes will be built)
  3. Same for local builds, developers would run a script that would copy the baur db dump from public s3 and set up postgres in local docker container
  4. All of above I can do already with postgres tools and my CI configuration.

However, I'm interested in the smallest db footprint possible. I don't care of older build digests, I only want to get the file digests from the most recent build without the historical data. This is where I would like to have baur db dump feature You could run baur db export all for full dump (same as pg_dump for example) or baur db export minimal for only the latest build data + digest. This will be the minimal amount of data required to build only the modified applications. It can be CSV format or JSON or whatever, not necessary postgres dump.

What do you think?

fho commented 3 years ago

Do you only need read-access to the data from the baur database in in CI? Or do you also create new records for the baur database in CI or by users and import it back into your main database?

kostyay commented 3 years ago

Do you only need read-access to the data from the baur database in in CI? Or do you also create new records for the baur database in CI or by users and import it back into your main database?

I need read-only access to the data so forked builds can build efficiently without actually connecting to a real database.

fho commented 3 years ago

Would it help if there would be a cleanup command to remove database records older then X?

kostyay commented 3 years ago

It wont be the same since older records may still be relevant (for example an app that hasn't changed in a while). Why not drop all build related data when an application is rebuilt? Everything prior to the current build is no longer relevant, so there is no reason to keep more data except the last build, everything else is stale data.

fho commented 3 years ago

True, I think we would need both options. In our usecase we also use older records for the same, not only the last one per app.

So a command like:

baur maint cleandb [--max-runs-per-app COUNT] [--max-age TIMESTAMP]

Would that help with your particular usecase?

I currently do not see a need for the suggested import/export export command apart from your usecase, which is basically a workaround to not having to expose psql. Everybody will have the need to cleanup the db somewhen though :-)

kostyay commented 3 years ago

Yeah that could work