nmadhire / jwpl

Automatically exported from code.google.com/p/jwpl
0 stars 0 forks source link

under Windows OS, datamachine mysqlimport - warnings and errors #94

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi, 
has anyone tried to create a database on Windows OS? 

While everything seems to be working on linux (64bit), I encountered problem at 
the final mysqlimport step when creating an exactly same copy of wikipedia on 
win7 64bit. 

The problem is that when using the "mysqlimport" statement, I see that there 
are lots of "warnings" (which didnt occur on linux). In order to log these 
warnings I used the following equivalent command to load one ".txt" file at a 
time:

"mysql -uroot -p [dbname] --default-character-set=utf8 --execute="LOAD DATA 
INFILE '[path]/page.txt' REPLACE INTO TABLE page FIELDS TERMINATED BY '\t'; 
SHOW WARNINGS" > $output.log"

I managed to capture an error when importing "page.txt":
-----------------------------------
"ERROR 1406 (22001) at line 1: Data too long for column 'isDisambiguation' at 
row 1"
-----------------------------------

And some wwarnings when importing "page_redirects.txt":
-----------------------------------
| Warning | 1366 | Incorrect string value: '\xF0\x92\x86\xB3\x0D' for column 
'redirects' at row 1550585        |
| Warning | 1366 | Incorrect string value: '\xF0\x92\x82\xBC\xF0\x92...' for 
column 'redirects' at row 1784951 |
| Warning | 1366 | Incorrect string value: '\xF0\x9D\x84\xAA\x0D' for column 
'redirects' at row 2088024        |
| Warning | 1366 | Incorrect string value: '\xF0\x9D\x84\xAB\x0D' for column 
'redirects' at row 2088025        |
-------------------------------------------

This seems to be an OS specific issue. Would be nice if some experts can 
identify the cause. Otherwise I have to try exporting the working-copy-on-linux 
and importing it to windows...
Thanks!

Original issue reported on code.google.com by ziqizhan...@googlemail.com on 15 May 2012 at 3:47

GoogleCodeExporter commented 9 years ago
Hmm, might be an encoding problem.
Since the encoding has been correctly defined in the mysqlimport command, there 
could still a problem somewhere else.
If utf8 is not the standard encoding on you system, you might have to run the 
DataMachine with the -Dfile.encoding=utf8 parameter. (also see: 
http://code.google.com/p/jwpl/wiki/DataMachine)

Also, you should check if you have created the database using the command
CREATE DATABASE [DB_NAME] DEFAULT CHARACTER SET utf8 DEFAULT COLLATE 
utf8_general_ci;

If this does not help, please extract the lines from the data file which cause 
these warnings and post them here.

Original comment by oliver.ferschke on 16 May 2012 at 10:16

GoogleCodeExporter commented 9 years ago
thanks, but I have set character encoding specifically already, and can confirm 
that I used the "create" statement as you said.

I extracted the line that caused the error (Error 1406... above) and attached 
as a screenshot. It is extremely long, since each line in the "page.txt" stores 
a single wikipedia article. The screenshot is about the tail of the first line, 
and I have highlighted the boundary with second line with red color. 

Im not sure how useful this is, since the last field "isDismabiguation" is a 
"bit" datatype, and it doesnt seem to show properly, as you can see. 

Original comment by ziqizhan...@googlemail.com on 16 May 2012 at 10:55

Attachments: