tursodatabase / turso-cli

Command line interface to Turso.
https://turso.tech
MIT License
225 stars 37 forks source link

Add `db import <file>` command #32

Open penberg opened 1 year ago

penberg commented 1 year ago

Let's add a db import <file> command for bulk uploading SQLite and perhaps CSV files, similar to what these guys are doing:

https://bit.io

https://github.com/bitdotioinc/pgsqlite

jmwoliver commented 1 year ago

I looked into this one a little bit. I was thinking it would be something like:

  1. turso db import new_database.db
  2. It would internally do db create new_database
  3. Then there are a couple approaches to get all the data in the new database with sqlite:
    • Attach and create all the tables:
      ATTACH DATABASE 'new_database.db' AS tmp_database;
      # get list of tables with tmp_database .tables and .schema
      CREATE TABLE new_database.TABLE(fieldname1 text, fieldname2 text);
      INSERT INTO new_database.TABLE SELECT * FROM tmp_database.TABLE;
    • Or just dump all the tables into the new database:
      ATTACH DATABASE 'database_name.db' AS tmp_database;
      .dump # from tmp_database 
      .read # into new_database

It doesn't seem like ATTACH and .dump are supported on the libsql side though:

→  ATTACH DATABASE 'new_database.db' AS tmp_database;
Error: Failed to execute SQL statement: ATTACH DATABASE 'new_database.db' AS tmp_database;
unsupported statement

Would support for those need to be added for this new import command to work, or did you have a different approach in mind? I didn't look much into https://github.com/bitdotioinc/pgsqlite since I figured going SQLite -> SQLite should be simpler than SQLite -> Postgres, but maybe you had something in mind from that.

penberg commented 1 year ago

Thanks for having a look @jmwoliver!

@psarna @MarinPostma What are your thoughts on making attach work from sqld point of view?

MarinPostma commented 1 year ago

Sqld already has a mechanism to load dumps from sqlite, but it cannot create dumps. In sqlite, dumps are implemented as part of the slite3 binary. Let's port the dump logic to sqld, this way, SQL can generate a dump from the database file and then import it.

handling ATTACH is more complicated, since it is sqld that handles replication, we'd need to somehow pass that information from libsql to sqld

jmwoliver commented 1 year ago

@MarinPostma Thanks for the response! If you happened to get the time to write some semi-detailed issues about these in the sqld repo to help me know where to start, I'd love to see if I could figure it out. I'd try to port the dump logic to sqld in my free time, but I am not very familiar with what that will entail. No worries if you are too busy to get to that right now though! 😄

I was playing around with sqld, but was running into some issues. I've cloned and built it locally, and am trying to get it up and running. I am pointing it to a simple test SQLite file, but get an error saying it is not a directory:

$ ./target/debug/sqld -d test.db -p 127.0.0.1:5432 --http-listen-addr=127.0.0.1:8000
           _     _ 
 ___  __ _| | __| |
/ __|/ _` | |/ _` |
\__ \ (_| | | (_| |
|___/\__, |_|\__,_|
        |_|        

Welcome to sqld!

version: 0.7.0
commit SHA: 9a5c900d5dfc0ec2ef490fb8c560e731b6891d71
build date: 2023-04-10

This software is in BETA version.
If you encounter any bug, please open an issue at https://github.com/libsql/sqld/issues

config:
        - mode: standalone
        - database path: test.db
        - listening for HTTP requests on: 127.0.0.1:8000
        - listening for PostgreSQL wire on: 127.0.0.1:5432
        - grpc_tls: no
Error: Not a directory (os error 20)

Looks like Stats::new(&config.db_path)? wants db_path to be a directory, but the README seems to indicate you can point to a DB file like I tried:

sqld -d foo.db -p 127.0.0.1:5432 --http-listen-addr=127.0.0.1:8000

I'm probably doing something wrong or not setting it up correctly though. If you get the chance, would you be able to point me to what I need to do to set up sqld locally?

glommer commented 1 year ago

@jmwoliver you don't run sqld pointing to a sqlite file, it has to be an empty directory. There are differences in the file to account for the wal replication.

To be explicit and clear: libsql, the core, can work with sqlite files, but sqld, the server, needs the wal hooks and so it cannot.

This has to be an explicit dump + load.

@MarinPostma we should prioritize getting this to work.

glommer commented 1 year ago

@jmwoliver dump the sqlite file to a file, for example:

sqlite3 test.db
SQLite version 3.37.0 2021-12-09 01:34:53
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE foo (var text);
sqlite> insert into foo (var) values ('glauber');
sqlite> insert into foo (var) values ('pekka');
sqlite> .output /tmp/sql.dump
sqlite> .dump
sqlite> ^D
|12:41:49|glaubercosta@Glaubers-MBP:[sqld]> cat /tmp/sql.dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE foo (var text);
INSERT INTO foo VALUES('glauber');
INSERT INTO foo VALUES('pekka');
COMMIT;

then start sqld (on an empty directory) pointing to that dump:

./target/debug/sqld --load-from-dump /tmp/sql.dump

I'll prioritize hooking this up at the Turso platform layer, then the CLI part should be easy.

glommer commented 1 year ago

hey @jmwoliver , I just pushed this: https://github.com/chiselstrike/turso-cli/pull/356

It uses a not-yet-merged API to seed a database.

The syntax we landed on was db create --from-file. We believe - but would love to hear from you - that the most common case for this is seeding a database from a SQLite file. For other cases where you want to add data to an existing database, likely the best will be to just execute a dump file on the shell. there is currently a blocker on the shell that prevents that, but we'll fix that too.

Thanks!

jmwoliver commented 1 year ago

@glommer Thanks for the steps on how to dump an existing database into sqld! I was able to get it dumped into a local sqld instance with the steps you provided.

And I agree, the use-case I was envisioning was being able to create a new database pre-populated with an existing SQLite file to get up and running quickly. I could see it being a neat demo/tutorial to be able to drop your existing database into turso and boom - everything is able to be distributed and replicated across the network. I'm curious though, is the 128MB size limit an arbitrary one you are imposing for now or is it a fly.io limitation?

glommer commented 1 year ago

@jmwoliver btw, this is not deployed yet because we are having some issues - hopefully just environment, so should be done soon - with fly.io.

But the limitation is a bit arbitrary indeed, and has to do with the time it would take to load the dumps, the amount of memory it would use, etc. Over time we'd be able to increase it.

jmwoliver commented 1 year ago

@glommer No worries! No need to rush fixing it for me, I've just been playing around with this because I thought it was cool, I'm not dependent on it in any way. 😃