[IMPROVEMENT] support sqlite

alimeerutech commented 3 years ago

adding support for sqlite helps with removing the dependency on postgres, this will enable deployment to keep the state along side the binary itself which makes it easier to setup and use.

fho commented 3 years ago

It's currently not possible to support multiple storage implementations. It would cost too much additional effort to maintain it and when developing new features.

If there is a better alternative for postgresql, that allows access to the data from multiple baur processes on different hosts, we can consider it.

fho commented 3 years ago

alimeerutech commented 3 years ago

My usecase is a simple build not accessed from multiple Baur processes, bundling the SQLite with the binary will remove the need for the whole Postgres dump and import export S3 dance, are you open to a PR?

fho commented 3 years ago

I agree that it would be a bit more simple then import/exporting a psql database. The difference seems minimal to me though: Instead of importing the postgres dump you will need to copy/donwload your shared sqlite database file from somewhere to your execution environment. Instead of exporting it you will need to copy/upload it some places that is reachable from all your environments. Starting a postgresql container in a docker container and importing/exporting the db seems not so much more effort or difficult to me. Or am I missing something?

If it would be possible to have a single storage implementation that can query psql and sqlite, I would be open for a PR. But afaik this is not possible because the SQL implementation of psql and sqlite differ too much.

I'm sorry but supporting 2 separate storage implementations in baur is currently not possible for us. It would cost too much additional effort compared to what we would gain.

alisade commented 3 years ago

https://github.com/dolthub/dolt with an s3 backend could be a possible alternative.

fho commented 3 years ago

@alisade first time that I hear about doIt. :-) How would that work when you run multiple instances of baur in parallel that are running and querying tasks? That would involve downloading & uploading the db file to s3 for every baur execution (sounds quite slow)? How would you handle data conflicts (merge conflicts) between different db files?

alisade commented 3 years ago

Would not support multiple instances, it's more like the sqlite implementation with better tooling. Speed would also be the same as running a server if done in dolt sql-server mode. I do not think there will be merge conflicts when used just for single deployment support.

Best regards,

On Tue, May 4, 2021 at 9:03 AM Fabian Holler @.***> wrote:

@alisade https://github.com/alisade first time that I hear about doIt. :-) How would that work when you run multiple instances of baur in parallel that are running and querying tasks? That would involve downloading & uploading the db file to s3 for every baur execution (sounds quite slow)? How would you handle data conflicts (merge conflicts) between different db files?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/simplesurance/baur/issues/308#issuecomment-832057362, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAT4Z4EVKICDJ54EQTYGUDTMALFNANCNFSM43I7BMFA .

fho commented 3 years ago

Ok, limiting database access to a single instance at a time is not an option. We would loose a major feature. baur must be able to read/write to data store in parallel from multiple running instances.

fho commented 3 years ago

It seems to me you could have a similar setup also with postgresql:

have a start shell-script that download a database dump from s3, starts a docker-psql container and imports the db dump
have a stop shell-script that dumps the baur database to a .sql file and uploads it to s3 or to your git branch

Starting/Stopping the db is probably a bit slower, depending on the db size the time difference might be negligible. (https://github.com/simplesurance/baur/issues/302#issuecomment-815591822 could also help with that)

Would a setup like that work for you? If not where to do you see the disadvantage compared to download/upload a sqlite db to s3 or using doIt?

alisade commented 3 years ago

The problem with running postgres for me is that I would have to run it either inside my Fargte container which means either going through the process of installation via a package manager and initing the db, or since my build is running in Fargate I will not be able to run postgres as a container inside the fargate task - triggered on-demand by Jenkins plugin - and as a result having to sidecar a postgres container to be able to use postgres. All of this is too much work just to maintain the last commit on directories in a repo. I thought either sqllite or dolt could possibly be a nicer developer experience with less moving parts to setup, esp if you do not want to run postgres server permanently just for this use case.

Best regards,

On Tue, May 4, 2021 at 9:28 AM Fabian Holler @.***> wrote:

It seems to me you could have a similar setup also with postgresql:

have a start shell-script that download a database dump from s3, starts a docker-psql container and imports the db dump

have a stop shell-script that dumps the baur database to a .sql file and uploads it to s3 or to your git branch

Starting/Stopping the db is probably a bit slower, depending on the db size the time difference might be negligible. (#302 (comment) https://github.com/simplesurance/baur/issues/302#issuecomment-815591822 could also help with that)

Would a setup like that work for you? If not where to do you see the disadvantage compared to download/upload a sqlite db to s3 or using doIt?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/simplesurance/baur/issues/308#issuecomment-832075788, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAT4ZZIDSUAICQL7ZFWNB3TMAOEVANCNFSM43I7BMFA .

fho commented 3 years ago

Thanks for the explanation, became now more clear to me why using psql is a bigger overhead for you.

I assume that you only use the baur functionality to run tasks that have not been run before for the same source files and always compare them only with the files from the last commit in the same branch?

Would mbt (https://github.com/mbtproject/mbt) maybe work better for your usecase?

alisade commented 3 years ago

Thanks I will take a look.

Best Regards, Ali

On May 5, 2021, at 01:24, Fabian Holler @.***> wrote:

Thanks for the explanation, became now more clear to me why using psql is a bigger overhead for you.

I assume that you only use the baur functionality to run tasks that have not been run before for the same source files and always compare them only with the files from the last commit in the same branch?

Would mbt (https://github.com/mbtproject/mbt) maybe work better for your usecase?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

gedw99 commented 3 months ago

Nats Jetstream looks useful in that it could replace e postal and also be reactive.

it can be your central store and pub sub across many mono repositories.

https://github.com/choria-io/asyncjobs is a global task system using nats Jetstream.

I have been using Jetstream to run many mono repos where the outputs of one build must caus other repositories and code with them to allow build.

Instead of s3 I use nats object store .

this gets my multi regions distributed HA too due to how nats super clusters work.

fho commented 3 months ago

Nats Jetstream looks useful in that it could replace e postal and also be reactive.

The goal of this issue was to remove dependencies to external services when running baur. Using NATS instead of Postgresql would only replace the dependency with another one.

gedw99 commented 3 months ago

Thanks @fho

Ah I see. PostgreSQL is still a dependency . Seems it is a dead requirement now ?

a http api to many storage providers is the end goal ?

I am just trying to work out how I can use bar without needing postresql. I use this with Scientists that have a mono repo for each scientist. I was looking for a way for them to run this easily with storage layer not needing docker or complex db etc.

I only suggested nats because I am usin it to wrap other storage providers . Because it’s reactive it makes it easy to build pipelines hat are generative ( not known at compile time ).

Nats can run embedded too . So it just started when bar starts . You can have global nats servers that output events bs k to embedded nats servers . So you end up with a mesh of reactive streams …

fho commented 3 months ago

Ah I see. PostgreSQL is still a dependency . Seems it is a dead requirement now ?

There is no plan to replace PostgreSQL with something else.

a http api to many storage providers is the end goal ?

Having to communicate with an HTTP-API instead of a PostgreSQL server would not help with the problem of this GitHub issue. It would only replace one dependency with another. Instead of a PostgreSQL server you would have to run a server providing this HTTP-API in your CI environment.

I am just trying to work out how I can use bar without needing postresql.

The only way would be that you implement support for the storage that you would like to use.

gedw99 commented 3 months ago

Thanks for the comments .

you have a fair bit of sql that needs to stay in sync with both db types.

I might instead just add nats Jetstream in a fork or something. It can be the db and the mesh and security .

Need to think about I some more though !

simplesurance / baur

[IMPROVEMENT] support sqlite #308