Open ukfreetv opened 4 years ago
Tried doing a direct import into MySQL of the BCP file, but similar problems...
mysql> LOAD DATA local INFILE 'extract_charity.bcp' INTO TABLE extract_charity CHARACTER SET BINARY FIELDS TERMINATED BY '@@' LINES TERMINATED BY '@@'; Query OK, 368474 rows affected, 65535 warnings (21.99 sec) Records: 368474 Deleted: 0 Skipped: 0 Warnings: 1105472**
mysql> LOAD DATA local INFILE 'extract_objects.bcp' INTO TABLE extract_objects CHARACTER SET BINARY FIELDS TERMINATED BY '@@' LINES TERMINATED BY '@@'; Query OK, 640671 rows affected, 2703 warnings (8.93 sec) Records: 640671 Deleted: 0 Skipped: 0 Warnings: 2703**
Sorry to be a pain.
Using "RegPlusExtract_May_2020.zip"
If you follow the instructions to run the Python scripts to convert the ".bcp" files to ".csv" files, the script corrupts the extract_charity.csv and extract_objects.csv files.
The issue seems to be linefeed or CR in some fields, the code is leaving them as CRLF, which is the record-end marker for CSV.
I've tried code to stick them back together, but this results in some lines missing fields that are unknown.
Hi @ukfreetv - I've had similar issues worth line feeds in the data before - but you should be able to alter the CSV import function to deal with it, using the same parameters as the output function.
One alternative is using the bcp-reader module which just streams the fields, even with line breaks.
Sorry to be a pain.
Using "RegPlusExtract_May_2020.zip"
If you follow the instructions to run the Python scripts to convert the ".bcp" files to ".csv" files, the script corrupts the extract_charity.csv and extract_objects.csv files.
The issue seems to be linefeed or CR in some fields, the code is leaving them as CRLF, which is the record-end marker for CSV.
I've tried code to stick them back together, but this results in some lines missing fields that are unknown.