philipsoutham / py-mysql2pgsql

Tool for migrating/converting from mysql to postgresql.
http://packages.python.org/py-mysql2pgsql/
MIT License
455 stars 169 forks source link

psycopg2.IntegrityError: null value in datetime #27

Closed siteroller closed 10 years ago

siteroller commented 11 years ago

psycopg2.IntegrityError: null value in column "datewritten" violates not-null constraint

Trying to move the following table over:

CREATE TABLE `articles` (
   `index` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
   `title` tinytext,
   `datewritten` date NOT NULL,
   PRIMARY KEY (`index`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `articles` (`index`, `title`, `datewritten`)
VALUES (1,'Hello World','0000-00-00');
kworr commented 11 years ago

I'd say it's up to MySQL to allow dates like 0000-00-00. From the PostgreSQL perspective this date is quite unreal as there are no month number 0 and day number 0. You have to fix your database to contain real dates.

siteroller commented 11 years ago

The date is not logical - agreed. MySQL is not a real database, it's a toy that breaks everything - got it.

I would love to join the REAL MEN. But I've got these huge databases in MySQL, and for some reason, MySQL does allow these dates. I was hoping to find a script that just moved things over without making me find all the places where MySQL doesn't respect the standard to update them. Especially, as here, I would switch it to NULL, but MySQL then converts it back to 00's.

Is it possible to have the script convert it from 00-00-0000 in Mysql to NULL in Postgre. Or at least to the beginning of (Unix) time?

Thank you very much for this project.

Sam

On Thu, Feb 7, 2013 at 2:12 PM, kworr notifications@github.com wrote:

I'd say it's up to MySQL to allow dates like 0000-00-00. From the PostgreSQL perspective this date is quite unreal as there are no month number 0 and day number 0. You have to fix your database to contain real dates.

— Reply to this email directly or view it on GitHubhttps://github.com/philipsoutham/py-mysql2pgsql/issues/27#issuecomment-13233235.

kworr commented 11 years ago

That's not a problem per se. The error message shows that field is already converted to NULL, but in table definition this field is marked as NOT NULL. I personally don't think the software should change table definition because it knows better. Maybe someone placed this NOT NULL constraint with intent to force storing real date in the table. This is exactly the same problem I faced when converting my database: the tables were full of FOREIGN KEYS and data that violated them. And it's only up to me as a real user to decide should I raise a question about finding incorrect code that writes junk data to the database or should I fix it myself or maybe should I lift FOREING KEYS constraint for the conversion time.

We can also implement SQL_MODE parsing for MySQL. But this would bring havoc to many structures. In your case NOT NULL constraint on datetime field acts as syntax sugar to make index smaller. With ZERO_DATE SQL Mode there's no way to enforce NOT NULL constraint over datetime field.

I'll think about that...

siteroller commented 11 years ago

I realize the issue now - hadn't realized the NOT NULL in the mysql table earlier.

Will wait and watch to see what you come up with.

Thank you.

kworr commented 11 years ago

0k, I just pushed a fix... Now no date field is treated as NOT NULL because MySQL never enforce this constraint for dates. And I think this is a right way to do this since http://bugs.mysql.com/bug.php?id=59526

gms8994 commented 10 years ago

@kworr What about date fields that do not have NOT NULL? This seems wrong to convert the schema in this way. Or at least have an option in the YML to use the existing datetime/date/etc types.

kworr commented 10 years ago

Well, I just tried to make it work the way databases are designed. MySQL doesn't enforce NOT NULL constraint on dates so if the schema is replicated "as is" PostgreSQL is just unable to store this data because in PostgreSQL NOT NULL means exactly NOT NULL, there's no hidden treasures and loose stones like in MySQL:

mysql> CREATE TABLE datetest (x1 date NOT NULL);
Query OK, 0 rows affected (0.14 sec)

mysql> INSERT INTO datetest VALUES ("0000-00-00");
Query OK, 1 row affected (0.01 sec)

mysql> SELECT count(*) from datetest where x1 is NULL;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> SELECT count(*) from datetest where x1 is NOT NULL;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

I still think there can be some improvement when I got my hands on rewriting data loading because adding another query to determine whether table really has some NULL values is a bit awkward and may take additional time (a lot of additional time if column is not indexed).

gms8994 commented 10 years ago

@kworr Yes, but I'm suggesting the situation where NULLs are allowed (and the default) for a date/datetime/etc column. For example:

08:13:41 (2) > create table datetest (x1 date default null);
Query OK, 0 rows affected (0.20 sec)

08:13:42 (3) > insert into datetest values ("0000-00-00");
Query OK, 1 row affected (0.00 sec)

08:13:52 (4) > select count(*) from datetest where x1 is null;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

08:14:00 (5) > select count(*) from datetest where x1 is not null;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

08:14:02 (6) > insert into datetest values(null);
Query OK, 1 row affected (0.01 sec)

08:14:14 (7) > select count(*) from datetest where x1 is null;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

From the schema that py-mysql2pgsql creates, there is no way to record an "unknown" date, because it forces date/datetime/etc columns to be NOT NULL, even ones that have been marked with default null.

kworr commented 10 years ago

@gms8994 Sorry, I don't fully understand. What I had done is discarding NOT NULL for a list of types because MySQL does actually the same. So right now MySQL definition of something date NOT NULL will be translated as something date (with default NULL being default).

I retested my own commit and found that I managed to mess up the logic so the code actually doesn't works the way I'm describing it. If we are talking about the same thing I think I'll go fix it...

gms8994 commented 10 years ago

Specifically, the problem I'm having is that I have the following mysql table:

CREATE TABLE `affiliates` (
  `affiliate_id` int(11) NOT NULL AUTO_INCREMENT,
  `affiliate_name` varchar(255) DEFAULT NULL,
  `affiliate_url` varchar(255) DEFAULT NULL,
  `add_date` datetime DEFAULT NULL,
  `paid` varchar(50) DEFAULT '0',
  `type` varchar(50) DEFAULT NULL,
  `type_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`affiliate_id`)
) ENGINE=MyISAM AUTO_INCREMENT=530 DEFAULT CHARSET=latin1;

and the resulting postgres table is:

CREATE TABLE "affiliates" (
  "affiliate_id" integer DEFAULT nextval('affiliates_affiliate_id_seq'::regclass) NOT NULL,
  "affiliate_name" character varying(255),
  "affiliate_url" character varying(255),
  "add_date" timestamp without time zone NOT NULL,
  "paid" character varying(50) DEFAULT '0'::character varying,
  "type" character varying(50),
  "type_id" integer
)
WITHOUT OIDS;

Notice that add_date in mysql is DEFAULT NULL, whereas add_date in postgres is NOT NULL.

kworr commented 10 years ago

Can you try it now?

gms8994 commented 10 years ago

Looks like that works. I'm running a conversion right now; so far it's looking pretty good.

kworr commented 10 years ago

Glad to hear that. Drop a line on whether it was successful or not :)

gms8994 commented 10 years ago

It was successful. Thanks! Question though; I timed a run converting 345 tables with ~20M rows (in total) on local instances of MySQL and Postgres. It took about 43 minutes. Does that seem reasonable? Machine is a 2012 iMac; default settings on both MySQL and Postgres.

kworr commented 10 years ago

We have an Issue for that... And yes it's not very fast for now. Thanks for feedback. I'm closing the ticket.