simonw / db-to-sqlite

CLI tool for exporting tables or queries from any SQL database to a SQLite file
Apache License 2.0
368 stars 28 forks source link

Correctly duplicate empty tables #28

Closed sevakram closed 3 years ago

sevakram commented 3 years ago

I'm just trying out this library with a blank django project. I've a project on heroku and I want to replicate the heroku database locally in sqlite, but it seems that db-to-sqlite is skipping some tables.

Command run -

db-to-sqlite postgres://uid:pwd@host/dbname db.sqlite3 --all -p

Output -

1/10: django_migrations
2/10: django_content_type
3/10: auth_permission
4/10: auth_group
5/10: auth_group_permissions
6/10: auth_user_groups
7/10: auth_user_user_permissions
8/10: django_admin_log
9/10: auth_user
10/10: django_session

Adding 1 foreign key
  auth_permission.content_type_id => django_content_type.id

When tried to run django locally, faced an error that django could not find django_admin_log, so I tried to see if tables have been copied correctly or not-

$ sqlite3 db.sqlite3
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> .tables
auth_permission      django_content_type  django_session
auth_user            django_migrations
sqlite>
rakbladsvalsen commented 3 years ago

Can also confirm this. The dump seems to be incomplete when using django. In fact, it seems to completely omit some Django auth tables, like auth_groups.

Otherwise it works flawlessly. Hope this gets fixed someday.

Edit: This happened to me when I was trying to convert a full PostgreSQL database. I haven't checked if this happens with MySQL/Mariadb, though. By doing further testing it seems that the tool is unable to convert some tables even with --table. The resulting sqlite file will be literally empty.

rakbladsvalsen commented 3 years ago

Just found something which is kinda a baid-aid for this problem. If you're using django, you can do the following:

  1. Migrate your app to an empty sqlite database (python manage.py migrate). This will create an empty database with your schema as defined in models.py
  2. Open the newly created database with sqlite3 db.sqlite3 and type the following:
    .out schema.sql
    .schema
    .quit

    This will extract the schema of your empty sqlite database.

  3. Use db-to-sqlite to migrate your postgre/mariadb database to sqlite. The resulting file will contain an incomplete dump. Let's assume that the resulting file is called dump.sqlite3
  4. Now run sqlite3 dump.sqlite3 < schema.sql. This will create the missing tables or whathever db-to-sqlite missed. You will see error messages, just ignore them. Or remove the conflicting lines.
  5. Literally eureka, now your dump.sqlite3 is complete.

I gave db-to-sqlite3 a quick read and it seems that there's no logic to handle empty tables. It could be a sqlalchemy bug too, but idk to be honest.

simonw commented 3 years ago

Yeah this is definitely due to tables being empty.

simonw commented 3 years ago

sqlite-utils has a db[table].create(...) method that can do thus. The trick will be introspecting the existing schema with SQLAlchemy and mapping the types to int/float/str.

The current code lazily avoids this by relying on sqlite-utils ability to create a schema automatically based on an incoming list of row dictionaries to be inserted.

simonw commented 3 years ago

https://github.com/simonw/db-to-sqlite/blob/d24b35ddaf297af7bd8504f47065f09c958cca9e/db_to_sqlite/cli.py#L101-L106

simonw commented 3 years ago

This SQLAlchemy method will give me details of the columns. Not sure how to make the types though: https://docs.sqlalchemy.org/en/13/core/reflection.html#sqlalchemy.engine.reflection.Inspector.get_columns

simonw commented 3 years ago

I think I need to provide a mapping for this list of subclases (or a subset thereof - maybe just the base Text one maps to str for example): https://docs.sqlalchemy.org/en/13/core/type_basics.html#generic-types

simonw commented 3 years ago

I can bring across NOT NULL too - I think sqlite-utils got more sophisticated about table creation since I first wrote db-to-sqlite.

simonw commented 3 years ago

https://docs.sqlalchemy.org/en/13/core/reflection.html#sqlalchemy.engine.reflection.Inspector.get_columns

Given a string table_name and an optional string schema, return column information as a list of dicts with these keys:

  • name - the column's name
  • type - the type of this column; an instance of TypeEngine
  • nullable - boolean flag if the column is NULL or NOT NULL
  • default - the column's server default value - this is returned as a string SQL expression.
  • autoincrement - indicates that the column is auto incremented - this is returned as a boolean or 'auto'
  • comment - (optional) the comment on the column. Only some dialects return this key
  • computed - (optional) when present it indicates that this column is computed by the database. Only some dialects return this key. Returned as a dict with the keys:
    • sqltext - the expression used to generate this column returned as a string SQL expression
    • persisted - (optional) boolean that indicates if the column is stored in the table New in version 1.3.16: - added support for computed reflection.
  • dialect_options - (optional) a dict with dialect specific options

For example:

(Pdb) inspector.get_columns(table) [{'name': 'id', 'type': INTEGER(), 'default': None, 'comment': None, 'nullable': False, 'autoincrement': True}, {'name': 'name', 'type': VARCHAR(length=32), 'default': None, 'comment': None, 'nullable': False}]

python_type looks useful:

(Pdb) inspector.get_columns(table)[0]['type'].python_type <class 'int'>

simonw commented 3 years ago

Got this error:

  File "/Users/simon/Dropbox/Development/db-to-sqlite/db_to_sqlite/cli.py", line 121, in cli
    for column in inspector.get_columns(table)
  File "/Users/simon/Dropbox/Development/db-to-sqlite/db_to_sqlite/cli.py", line 121, in <dictcomp>
    for column in inspector.get_columns(table)
  File "/Users/simon/.local/share/virtualenvs/db-to-sqlite-Nk_u3PUs/lib/python3.7/site-packages/sqlalchemy/sql/type_api.py", line 416, in python_type
    raise NotImplementedError()
NotImplementedError

It was because of {'name': 'ip', 'type': INET(), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}

simonw commented 3 years ago

Tried this on my blog and got the following SQLite schema for an empty table:

CREATE TABLE [blog_comment] (
   [id] INTEGER,
   [object_id] INTEGER,
   [body] TEXT,
   [created] TEXT,
   [name] TEXT,
   [url] TEXT,
   [email] TEXT,
   [openid] TEXT,
   [ip] TEXT,
   [spam_status] TEXT,
   [visible_on_site] INTEGER,
   [spam_reason] TEXT,
   [content_type_id] INTEGER,
   FOREIGN KEY([content_type_id]) REFERENCES [django_content_type]([id])
);
CREATE INDEX [idx_blog_comment_content_type_id]
    ON [blog_comment] ([content_type_id]);