Closed siteroller closed 10 years ago
I'd say it's up to MySQL to allow dates like 0000-00-00. From the PostgreSQL perspective this date is quite unreal as there are no month number 0 and day number 0. You have to fix your database to contain real dates.
The date is not logical - agreed. MySQL is not a real database, it's a toy that breaks everything - got it.
I would love to join the REAL MEN. But I've got these huge databases in MySQL, and for some reason, MySQL does allow these dates. I was hoping to find a script that just moved things over without making me find all the places where MySQL doesn't respect the standard to update them. Especially, as here, I would switch it to NULL, but MySQL then converts it back to 00's.
Is it possible to have the script convert it from 00-00-0000 in Mysql to NULL in Postgre. Or at least to the beginning of (Unix) time?
Thank you very much for this project.
Sam
On Thu, Feb 7, 2013 at 2:12 PM, kworr notifications@github.com wrote:
I'd say it's up to MySQL to allow dates like 0000-00-00. From the PostgreSQL perspective this date is quite unreal as there are no month number 0 and day number 0. You have to fix your database to contain real dates.
— Reply to this email directly or view it on GitHubhttps://github.com/philipsoutham/py-mysql2pgsql/issues/27#issuecomment-13233235.
That's not a problem per se. The error message shows that field is already converted to NULL, but in table definition this field is marked as NOT NULL. I personally don't think the software should change table definition because it knows better. Maybe someone placed this NOT NULL constraint with intent to force storing real date in the table. This is exactly the same problem I faced when converting my database: the tables were full of FOREIGN KEYS and data that violated them. And it's only up to me as a real user to decide should I raise a question about finding incorrect code that writes junk data to the database or should I fix it myself or maybe should I lift FOREING KEYS constraint for the conversion time.
We can also implement SQL_MODE parsing for MySQL. But this would bring havoc to many structures. In your case NOT NULL constraint on datetime field acts as syntax sugar to make index smaller. With ZERO_DATE SQL Mode there's no way to enforce NOT NULL constraint over datetime field.
I'll think about that...
I realize the issue now - hadn't realized the NOT NULL in the mysql table earlier.
Will wait and watch to see what you come up with.
Thank you.
0k, I just pushed a fix... Now no date field is treated as NOT NULL because MySQL never enforce this constraint for dates. And I think this is a right way to do this since http://bugs.mysql.com/bug.php?id=59526
@kworr What about date fields that do not have NOT NULL? This seems wrong to convert the schema in this way. Or at least have an option in the YML to use the existing datetime/date/etc types.
Well, I just tried to make it work the way databases are designed. MySQL doesn't enforce NOT NULL
constraint on dates so if the schema is replicated "as is" PostgreSQL is just unable to store this data because in PostgreSQL NOT NULL
means exactly NOT NULL
, there's no hidden treasures and loose stones like in MySQL:
mysql> CREATE TABLE datetest (x1 date NOT NULL);
Query OK, 0 rows affected (0.14 sec)
mysql> INSERT INTO datetest VALUES ("0000-00-00");
Query OK, 1 row affected (0.01 sec)
mysql> SELECT count(*) from datetest where x1 is NULL;
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
mysql> SELECT count(*) from datetest where x1 is NOT NULL;
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
I still think there can be some improvement when I got my hands on rewriting data loading because adding another query to determine whether table really has some NULL values is a bit awkward and may take additional time (a lot of additional time if column is not indexed).
@kworr Yes, but I'm suggesting the situation where NULLs are allowed (and the default) for a date/datetime/etc column. For example:
08:13:41 (2) > create table datetest (x1 date default null);
Query OK, 0 rows affected (0.20 sec)
08:13:42 (3) > insert into datetest values ("0000-00-00");
Query OK, 1 row affected (0.00 sec)
08:13:52 (4) > select count(*) from datetest where x1 is null;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
08:14:00 (5) > select count(*) from datetest where x1 is not null;
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
08:14:02 (6) > insert into datetest values(null);
Query OK, 1 row affected (0.01 sec)
08:14:14 (7) > select count(*) from datetest where x1 is null;
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
From the schema that py-mysql2pgsql creates, there is no way to record an "unknown" date, because it forces date/datetime/etc columns to be NOT NULL, even ones that have been marked with default null.
@gms8994 Sorry, I don't fully understand. What I had done is discarding NOT NULL
for a list of types because MySQL does actually the same. So right now MySQL definition of something date NOT NULL
will be translated as something date
(with default NULL
being default).
I retested my own commit and found that I managed to mess up the logic so the code actually doesn't works the way I'm describing it. If we are talking about the same thing I think I'll go fix it...
Specifically, the problem I'm having is that I have the following mysql table:
CREATE TABLE `affiliates` (
`affiliate_id` int(11) NOT NULL AUTO_INCREMENT,
`affiliate_name` varchar(255) DEFAULT NULL,
`affiliate_url` varchar(255) DEFAULT NULL,
`add_date` datetime DEFAULT NULL,
`paid` varchar(50) DEFAULT '0',
`type` varchar(50) DEFAULT NULL,
`type_id` int(11) DEFAULT NULL,
PRIMARY KEY (`affiliate_id`)
) ENGINE=MyISAM AUTO_INCREMENT=530 DEFAULT CHARSET=latin1;
and the resulting postgres table is:
CREATE TABLE "affiliates" (
"affiliate_id" integer DEFAULT nextval('affiliates_affiliate_id_seq'::regclass) NOT NULL,
"affiliate_name" character varying(255),
"affiliate_url" character varying(255),
"add_date" timestamp without time zone NOT NULL,
"paid" character varying(50) DEFAULT '0'::character varying,
"type" character varying(50),
"type_id" integer
)
WITHOUT OIDS;
Notice that add_date in mysql is DEFAULT NULL, whereas add_date in postgres is NOT NULL.
Can you try it now?
Looks like that works. I'm running a conversion right now; so far it's looking pretty good.
Glad to hear that. Drop a line on whether it was successful or not :)
It was successful. Thanks! Question though; I timed a run converting 345 tables with ~20M rows (in total) on local instances of MySQL and Postgres. It took about 43 minutes. Does that seem reasonable? Machine is a 2012 iMac; default settings on both MySQL and Postgres.
We have an Issue for that... And yes it's not very fast for now. Thanks for feedback. I'm closing the ticket.
psycopg2.IntegrityError: null value in column "datewritten" violates not-null constraint
Trying to move the following table over: