Closed CallumBDX closed 1 year ago
Hey!
Thanks for the bug report. Interesting one for sure.
Will see when I have time to look at it.
Feel free to submit a PR if you have the time. :)
Hi! Thanks for having a look whenever you get ther chance. I will try to have a look into it a bit further and see if I can find the underlying issue, which I think I traced as far as sqlite3 package itself so I don't think it's a bug in this repo. So might have to start looking into that package itself when I get some free time... but a quicker fix for sure would be something to catch it here so I will look to make a PR for that if I don't get anywhere with sqlite3. :)
Nice find. I am using MySQL 8+ so that certainly could be the problem. Have tried replacing this:
BLOB NOT NULL DEFAULT b'0'
with this:
b BLOB NOT NULL DEFAULT ('0')
and can confirm that that now works. The byte id did have to go at the beginning for it to work. Though also worked without the brackets:
b BLOB NOT NULL DEFAULT '0'
So maybe its more to do with the placement of the byte id?
Looks like BLOB
s inside SQLite need to be hexadecimal data
BLOB literals are string literals containing hexadecimal data and preceded by a single "x" or "X" character. Example: X'53514C697465'
https://www.sqlite.org/lang_expr.html#literal_valuesconstants
The hexadecimal representation of b'1'
is 31
, which translates to SQLite X'31'
.
So your example above would actually be
"do_correction" BLOB NOT NULL DEFAULT X'31'
Aye that works as well:
b'0'
-> x'30'
b'1'
-> x'31'
Could grab any column_type that matches *BLOB and parse the string containing the bytestring?
BLOB_PATTERN = re.compile(r"b'([0-9]+)'")
...
if column_type.upper()[-4:] == "BLOB":
column_default = bytes(cls.BLOB_PATTERN.match(column_default).group(1), 'utf-8').hex()
return "DEFAULT x'{}'".format(column_default)
A bit janky, and assumes encoding (though utf-8 is probably a fair assumption). There is probably a cleaner way to do that conversion from string to bytestring to hex but can't think of any off the top of my head.
Aye that works as well:
b'0'
->x'30'
b'1'
->x'31'
Could grab any column_type that matches *BLOB and parse the string containing the bytestring?
BLOB_PATTERN = re.compile(r"b'([0-9]+)'") ... if column_type.upper()[-4:] == "BLOB": column_default = bytes(cls.BLOB_PATTERN.match(column_default).group(1), 'utf-8').hex() return "DEFAULT x'{}'".format(column_default)
A bit janky, and assumes encoding (though utf-8 is probably a fair assumption). There is probably a cleaner way to do that conversion from string to bytestring to hex but can't think of any off the top of my head.
Hmm, why that regex? My solution is https://github.com/techouse/mysql-to-sqlite3/pull/36/files#diff-31f880d9d3e1134796e58e015c1d3e81846ccf18dd2a769d9ae2e07981fd0609R256
@classmethod
def _translate_default_from_mysql_to_sqlite(
cls, column_default=None, column_type=None
):
if isinstance(column_default, bytes):
if column_type == "BLOB":
return "DEFAULT x'{}'".format(column_default.hex())
## handle other stuff
Aye that was what I tried first as well, but the column_default is being passed through as a string containing a python representation of a bytestring, itself representing a string of integers.
i.e. " b'0' "
so isinstance only sees it as a string.
Only other way I got it to vaguely work was using exec
on column_default, but I hated that, so switched to regex.
Not sure if it helps, but the column type from MySQL is BIT and we use it in our SQLAlchemy model with:
from sqlalchemy.dialects.mysql import BIT
e.g.
do_correction = Column(BIT(1))
Looks like Python 2 uses bytes
as a synonym for str
😵💫
While I try to figure out how to address this in Python 2. You might want to check out the PR branch and test it against your DB.
@CallumBDX this should do now. Do you mind testing it?
@classmethod
def _translate_default_from_mysql_to_sqlite(
cls, column_default=None, column_type=None
):
if isinstance(column_default, six.binary_type):
if column_type in {
"BIT",
"BINARY",
"BLOB",
"LONGBLOB",
"MEDIUMBLOB",
"TINYBLOB",
"VARBINARY",
}:
if six.PY2:
try:
return "DEFAULT x'{}'".format(column_default.encode("hex"))
except AttributeError:
pass
else:
return "DEFAULT x'{}'".format(column_default.hex())
## other stuff
Hi @techouse, sorry I wasn't able to test until today, but unfortunately that doesn't work for the column_default being passed through, as isinstance
with six.binary_type
doesn't recognise it as anything other than a string literal (same problem as comparing with bytes).
From what I can see, the default is being passed to the fn as a bytestring correctly, e.g. b"b'1'"
, but is then decoded by the first try block into a string literal ("b'1'"
):
try:
column_default = column_default.decode()
Without the intitial attempt at decoding, the .hex()
method would still produce an incorrect value as it would include the two apostophes and the b
char, e.g. DEFAULT x'62273127'
instead of DEFAULT x'31'
.
The problem might then be that it is a double bytestring?
EDIT: to put it in context of the pytests, it would need to pass the following param:
pytest.param(b"b'1'", "DEFAULT x'31'", id="b\"b'1'\"")
Interesting. But how can it be double encoded? Surely this is not normal behaviour, right?
That's what has confused me as well. It is pulling that double encoded value straight from the mysql cursor extract. And I can see in MySQL Workbench that the default values are saved as e.g. b'1', which then must be wrapped as a string literal when fetched rather than parsed into a bytestring. As it obviously works for MySQL to have it defined as such, then I'm surprised that the mysql python library doesn't do the work for that already.
With raw=False
when initiallising _mysql_cur_dict
, the attempted automatic conversion of the raw row data types to python types doesn't make any difference to the default value.
I would say that it may be a bug with mysql-connector then.
Possibly. I've had to pin the Connector to <8.0.30
because it became incompatible with MariaDB in 8.0.30 https://github.com/techouse/sqlite3-to-mysql/issues/46
Check if they made any fixes in the Connector since 8.0.29 https://dev.mysql.com/doc/relnotes/connector-python/en/news-8-0-30.html
Don't see anything in the changelogs from 8.0.30 onwards.
What version of the Connector did you use?
Check what they say here in the bug fixes made in 8.0.24 https://dev.mysql.com/doc/relnotes/connector-python/en/news-8-0-24.html
Binary columns were returned as strings instead of 'bytes' or 'bytearray'. (Bug #29622520, Bug #30349010, Bug #30416704, Bug #94944, Bug #96999, Bug #97177, Bug #97723)
Using 8.0.29. And I saw that but didn't think it was related. Looks like they reference an issue where selecting data would return a mixture of data types (e.g. seen below). They then fixed that to always return bytes:
cur.execute("SELECT * FROM tmp")
----->
(1, 'HELLO\x00\x00\x00\x00\x00') # string
(2, bytearray(b'\xffz\xb0\xdeP\x00\x00\x00\x00\x00')) # bytearray
In our case, the return is in bytes - technically.
Can you try and manually force update to the latest version of the Connector?
Tried with latest connector version (8.0.33) and the issue persists.
EDIT: also tried the pure python implementation with use_pure=True
as a kwarg in the _mysql.connect()
method, which then returns the default value as string/byte instead of byte/byte double-encoding, e.g. "b'1'"
, so no closer really.
This is really weird. Can you send me a sample dump of your MySQL DB so I can investigate a bit more locally?
Would the DDL script be enough for that do you think? I can look into making a dump though I haven't needed to do that previously.
Can be just the offending table DDL with some dummy/representative data.
So I just tried creating an example MySQL table
CREATE TABLE `example` (
`id` INT NOT NULL AUTO_INCREMENT,
`data` BLOB NOT NULL DEFAULT (1),
`bit` BIT(1) NOT NULL DEFAULT (1),
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
and converted that successfully using the tool to
CREATE TABLE "example" (
"id" INTEGER NOT NULL ,
"data" BLOB NOT NULL DEFAULT x'31' ,
"bit" BLOB NOT NULL DEFAULT x'31' ,
PRIMARY KEY ("id")
);
@CallumBDX can you try doing the same on your end?
For reference, here's my setup
software | version |
---|---|
mysql-to-sqlite3 | 1.4.15 |
Operating System | Darwin 22.4.0 |
Python | CPython 3.11.1 |
MySQL | mysql Ver 8.0.31 for macos12.6 on x86_64 (Homebrew) |
SQLite | 3.39.5 |
click | 8.1.3 |
mysql-connector-python | 8.0.29 |
python-slugify | 5.0.2 |
pytimeparse | 1.1.8 |
simplejson | 3.19.1 |
six | 1.16.0 |
tabulate | 0.9.0 |
tqdm | 4.65.0 |
Hi @techouse, sorry for the delay, the following is a tweaked DDL used for the table in question:
CREATE TABLE `device_type` (
`id` bigint NOT NULL AUTO_INCREMENT,
`version` bigint NOT NULL,
`c1` bit(1) NOT NULL,
`c2` double NOT NULL,
`c3` double NOT NULL,
`c4` double NOT NULL,
`c5` bit(1) NOT NULL,
`c6` double NOT NULL,
`c7` double NOT NULL,
`c8` varchar(255) DEFAULT NULL,
`c9` double NOT NULL,
`c10` varchar(255) DEFAULT NULL,
`c12` varchar(255) NOT NULL,
`c13` bit(1) NOT NULL,
`c14` varchar(255) DEFAULT NULL,
`c15` double NOT NULL,
`c16` bit(1) NOT NULL,
`c17` double NOT NULL,
`c18` bit(1) NOT NULL DEFAULT b'1',
`c19` bit(1) NOT NULL DEFAULT b'1',
`c20` bit(1) NOT NULL DEFAULT b'1',
`c21` bit(1) NOT NULL,
`c22` varchar(255) NOT NULL,
`c23` int NOT NULL,
`c24` int NOT NULL,
`c25` int NOT NULL,
`c26` bit(1) NOT NULL DEFAULT b'0',
`c27` double DEFAULT NULL,
`c28` double DEFAULT NULL,
`c29` double DEFAULT NULL,
`c30` double DEFAULT NULL,
`c31` bit(1) NOT NULL,
`c32` bit(1) NOT NULL DEFAULT b'0',
`c33` varchar(255) DEFAULT NULL,
`c34` bit(1) NOT NULL,
`c35` double NOT NULL,
`c36` double NOT NULL,
`c37` double NOT NULL,
`c38` double NOT NULL,
`c39` varchar(255) DEFAULT NULL,
`c40` bit(1) NOT NULL,
`c41` bit(1) NOT NULL,
`c42` varchar(255) DEFAULT NULL,
`c43` double DEFAULT NULL,
`c44` double DEFAULT NULL,
`c45` int DEFAULT NULL,
`c46` double DEFAULT NULL,
`c47` varchar(255) DEFAULT NULL,
`c48` int DEFAULT NULL,
`c49` bit(1) NOT NULL,
`c50` double DEFAULT NULL,
`c51` double DEFAULT NULL,
`c52` varchar(255) DEFAULT NULL,
`c53` varchar(255) DEFAULT NULL,
`c54` double DEFAULT NULL,
`c55` bit(1) NOT NULL,
`c56` bit(1) NOT NULL DEFAULT b'0',
PRIMARY KEY (`id`),
UNIQUE KEY `c12` (`c12`)
) ENGINE=InnoDB AUTO_INCREMENT=535 DEFAULT CHARSET=latin1
As you can see, the default values being set as b'1'
/b'0'
from the outset might be the issue. Also, the charset is latin1
but I have already tried setting that instead of the default charset and that doesn't affect the mysql package parsing.
I see. What if you changed all those b'1'
to (1)
and b'0'
to (0)
, as the MySQL docs state?
https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html
Is it pulling the DDL to parse the columns?
Is it pulling the DDL to parse the columns?
Not sure what you mean. Just change it to somehting like
CREATE TABLE `device_type` (
`id` bigint NOT NULL AUTO_INCREMENT,
`version` bigint NOT NULL,
`c1` bit(1) NOT NULL,
`c2` double NOT NULL,
`c3` double NOT NULL,
`c4` double NOT NULL,
`c5` bit(1) NOT NULL,
`c6` double NOT NULL,
`c7` double NOT NULL,
`c8` varchar(255) DEFAULT NULL,
`c9` double NOT NULL,
`c10` varchar(255) DEFAULT NULL,
`c12` varchar(255) NOT NULL,
`c13` bit(1) NOT NULL,
`c14` varchar(255) DEFAULT NULL,
`c15` double NOT NULL,
`c16` bit(1) NOT NULL,
`c17` double NOT NULL,
`c18` bit(1) NOT NULL DEFAULT (1),
`c19` bit(1) NOT NULL DEFAULT (1),
`c20` bit(1) NOT NULL DEFAULT (1),
`c21` bit(1) NOT NULL,
`c22` varchar(255) NOT NULL,
`c23` int NOT NULL,
`c24` int NOT NULL,
`c25` int NOT NULL,
`c26` bit(1) NOT NULL DEFAULT (0),
`c27` double DEFAULT NULL,
`c28` double DEFAULT NULL,
`c29` double DEFAULT NULL,
`c30` double DEFAULT NULL,
`c31` bit(1) NOT NULL,
`c32` bit(1) NOT NULL DEFAULT (0),
`c33` varchar(255) DEFAULT NULL,
`c34` bit(1) NOT NULL,
`c35` double NOT NULL,
`c36` double NOT NULL,
`c37` double NOT NULL,
`c38` double NOT NULL,
`c39` varchar(255) DEFAULT NULL,
`c40` bit(1) NOT NULL,
`c41` bit(1) NOT NULL,
`c42` varchar(255) DEFAULT NULL,
`c43` double DEFAULT NULL,
`c44` double DEFAULT NULL,
`c45` int DEFAULT NULL,
`c46` double DEFAULT NULL,
`c47` varchar(255) DEFAULT NULL,
`c48` int DEFAULT NULL,
`c49` bit(1) NOT NULL,
`c50` double DEFAULT NULL,
`c51` double DEFAULT NULL,
`c52` varchar(255) DEFAULT NULL,
`c53` varchar(255) DEFAULT NULL,
`c54` double DEFAULT NULL,
`c55` bit(1) NOT NULL,
`c56` bit(1) NOT NULL DEFAULT (0),
PRIMARY KEY (`id`),
UNIQUE KEY `c12` (`c12`)
) ENGINE=InnoDB AUTO_INCREMENT=535 DEFAULT CHARSET=latin1
The docs would indicate that the b'value'
notation is valid in 8.0, so their connector really should be able to parse it correctly. Looks like the timeframe for them to fix issues is a bit slow though, in the order of years.
https://dev.mysql.com/doc/refman/8.0/en/bit-type.html
The only other valid notation for bit that I can see is hex.
Is it pulling the DDL to parse the columns?
Not sure what you mean. Just change it to somehting like
CREATE TABLE `device_type` ( `id` bigint NOT NULL AUTO_INCREMENT, `version` bigint NOT NULL, `c1` bit(1) NOT NULL, `c2` double NOT NULL, `c3` double NOT NULL, `c4` double NOT NULL, `c5` bit(1) NOT NULL, `c6` double NOT NULL, `c7` double NOT NULL, `c8` varchar(255) DEFAULT NULL, `c9` double NOT NULL, `c10` varchar(255) DEFAULT NULL, `c12` varchar(255) NOT NULL, `c13` bit(1) NOT NULL, `c14` varchar(255) DEFAULT NULL, `c15` double NOT NULL, `c16` bit(1) NOT NULL, `c17` double NOT NULL, `c18` bit(1) NOT NULL DEFAULT (1), `c19` bit(1) NOT NULL DEFAULT (1), `c20` bit(1) NOT NULL DEFAULT (1), `c21` bit(1) NOT NULL, `c22` varchar(255) NOT NULL, `c23` int NOT NULL, `c24` int NOT NULL, `c25` int NOT NULL, `c26` bit(1) NOT NULL DEFAULT (0), `c27` double DEFAULT NULL, `c28` double DEFAULT NULL, `c29` double DEFAULT NULL, `c30` double DEFAULT NULL, `c31` bit(1) NOT NULL, `c32` bit(1) NOT NULL DEFAULT (0), `c33` varchar(255) DEFAULT NULL, `c34` bit(1) NOT NULL, `c35` double NOT NULL, `c36` double NOT NULL, `c37` double NOT NULL, `c38` double NOT NULL, `c39` varchar(255) DEFAULT NULL, `c40` bit(1) NOT NULL, `c41` bit(1) NOT NULL, `c42` varchar(255) DEFAULT NULL, `c43` double DEFAULT NULL, `c44` double DEFAULT NULL, `c45` int DEFAULT NULL, `c46` double DEFAULT NULL, `c47` varchar(255) DEFAULT NULL, `c48` int DEFAULT NULL, `c49` bit(1) NOT NULL, `c50` double DEFAULT NULL, `c51` double DEFAULT NULL, `c52` varchar(255) DEFAULT NULL, `c53` varchar(255) DEFAULT NULL, `c54` double DEFAULT NULL, `c55` bit(1) NOT NULL, `c56` bit(1) NOT NULL DEFAULT (0), PRIMARY KEY (`id`), UNIQUE KEY `c12` (`c12`) ) ENGINE=InnoDB AUTO_INCREMENT=535 DEFAULT CHARSET=latin1
Sorry, ignore that comment. I will speak to our system dev engineer about changing the default values, and see about trying it with the development server.
The docs would indicate that the
b'value'
notation is valid in 8.0, so their connector really should be able to parse it correctly. Looks like the timeframe for them to fix issues is a bit slow though, in the order of years. https://dev.mysql.com/doc/refman/8.0/en/bit-type.htmlThe only other valid notation for bit that I can see is hex.
That probably does not relate to the DEFAULT
but rather to the value, if I read that correctly.
Note, to use the parentheses notation (1)
you'll need MySQL v8.0.13+
The docs would indicate that the
b'value'
notation is valid in 8.0, so their connector really should be able to parse it correctly. Looks like the timeframe for them to fix issues is a bit slow though, in the order of years. https://dev.mysql.com/doc/refman/8.0/en/bit-type.html The only other valid notation for bit that I can see is hex.That probably does not relate to the
DEFAULT
but rather to the value, if I read that correctly.Note, to use the parentheses notation
(1)
you'll need MySQL v8.0.13+
Hmm. My interpretation as per the intial link you sent on default data types is that the default values can be either literal or expresssions. The link I sent defines what the BIT literal would be were one to use it. Unfortunately they don't give the explicit example using the binary literal.
You can simply find that out by running
SHOW COLUMNS FROM `device_type`
and checking the Default
column
Aye, have done that previously and I get back b'1'
and b'0'
for the bit(1) columns.
Which means that they are strings, not bytes.
Which means that they are strings, not bytes.
Yeah you are completely right. Looking closer I can see it is saved as 4 bytes instead of 1. The actual values in the table though are either 0/1 and stored as single bytes. Maybe it is storing/displaying as string as that was what it was set with, but still parsing it correctly when it actually inserts a new row? Again, I'll have a chat with our sys dev guy and find out why it is how it is. Perhaps when he inserts a new row he is explicitly setting the value anyway and not even bothering with using the default value.
So I had a meeting with our sys dev guy, and the MySQL table is not made straight with MySQL, but from a grails framework (groovy on rails) which under the bonnet uses the java hibernate plugin to do the type conversions for MySQL. So when he specified a boolean of default false, it generated bit(1) NOT NULL DEFAULT b'0'
. This seems to be an issue raised with the plugin managers, and was fixed since these data tables were initially created:
https://www.danvega.dev/blog/2013/08/27/grails-mysql-boolean-gotcha/
In terms of how the values are stored, the default is stored as 4 bytes e.g. b'0'
, but the actual values in the table are stored as a single byte 0
, so MySQL must be able to use the default value as it is intended.
So as far as I can gather, the use of the binary string b'0'
is valid SQL, so ultimately the connector is not parsing that correctly.
In terms of how the values are stored, the default is stored as 4 bytes e.g. b'0', but the actual values in the table are stored as a single byte 0, so MySQL must be able to use the default value as it is intended.
But that's the point, it's not a value. It's a definition living in the DDL. Usually these are always strings. Have you taken a look at this StackOverflow question?
So as far as I can gather, the use of the binary string
b'0'
is valid SQL, so ultimately the connector is not parsing that correctly.
I'm not using any connector magic to get the default value but simply querying SHOW COLUMNS FROM table_name
and then using the Default
from the result.
I suggest you use some application logic to add these defaults in the tables as using these special defaults will only create more headaches in the long term. Alternatively you could use this suggestion or as the MySQL docs state
CREATE TABLE `test_table` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`column_name` BIT(1) NOT NULL DEFAULT (1), # (1) instead of b'1'
PRIMARY KEY (`id`)
)
Hi @techouse, yeah I have seen that StackOverFlow answer, which does suggest not to use that notation - though they don't really explain why it is not valid. I think the MySQL documentation could be a lot clearer on this.
And knowing how the default value is stored in MySQL I see that the problem isn't the connector as you say. Unfortunately I don't see this getting fixed on our end either in application or via MySQL for a long time in production, so I will have to continue to use the fork that parses it and fixes it (even though I agree it probably shouldn't exist in that form).
Thanks for spending your time to look at this issue. Hard to say whether it is worth having that notation parsable since it technically works in MySQL, even if it might not be the cleanest way to represent the default value in the DDL.
My suggestion is to run away from MySQL as soon as you can. Postgres is waaaay betta 😎
Haha unfortunately not my choice. Just trying to keep the peace between a SQL Server, MySQL server, and various SQLite files... adding in another flavour with an admittedly excellent mascot would break me long before it breaks prod 😉.
Describe the bug This bug happens when I run the transfer method on the converter object. It happens at line 409 in transporter.py, and the issue appears to be in the use of 'b'0''/'b'1'' as the default values for certain columns (those using MySQL BIT type). So a column spec entry like this returned from _build_create_table_sql causes the error:
Modifying it so that it instead returns the statement below also does not work:
However, if I change the return for this default type to give plain '0' or '1' (or 0/1 as int works but is shady), then the transfer works correctly:
My understanding of SQLite is that it doesn't use any kit of strict bit or boolean types, but would store as an integer anyway. Having the byte string as specified I would think would eventually still cast it as an int (if it worked), so might be worth an additional filter to strip out the byte designation in _translate_default_from_mysql_to_sqlite. Though it does surprise me that the b'0' style doesn't work, so probably something else at play.
Expected behaviour Completed transfer of MySQL table to SQLite db file.
Actual result The entitled error where it fails to do so due to a syntax error.
System Information Windows 10 Python 3.8.10 mysql-to-sqlite3==1.4.14