Closed berz closed 4 years ago
I tested the import on both PostgreSQL and MySQL with the current code and both work.
Postgres does throw some errors while applying constraints and building indices to the full dump for may 2020, but this seems to be unrelated to my changes.
$ python3 postgresql/psql.py < postgresql/sql/CreateFKConstraints.sql
ERROR: insert or update on table "release_artist" violates foreign key constraint "release_artist_fk_artist"
DETAIL: Key (artist_id)=(0) is not present in table "artist".
ERROR: insert or update on table "release_track_artist" violates foreign key constraint "release_track_artist_fk_artist"
DETAIL: Key (artist_id)=(118760) is not present in table "artist".
ERROR: insert or update on table "release_company" violates foreign key constraint "release_company_fk_company"
DETAIL: Key (company_id)=(1363089) is not present in table "label".
$ python3 postgresql/psql.py < postgresql/sql/CreateIndexes.sql
ERROR: index row size 5376 exceeds maximum 2712 for index "release_track_idx_title"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
Sorry for the noise, seems I should not be rebasing my commits after sending a PR.
Just trying it now
Just running: nohup python3 speedup/exporter.py --bz2 --debug dumps csv-dir --export artist --export release --export label --export master &
I did have a few issues trying to run exporter.py, my first very similar attempts failed either giving error or just doing nothing. I wonder if it could default to exporting all four files so you don't have to specify them ?
Now running import.csv - it is working but three points
Thanks for having a look. Could you tell me which errors you are getting?
I too dislike the --export
arguments for speedup/exporter.py
. I kept it the way it was because it was not essential for adding mysql support. I would prefer to just pass paths, for example
# to export only the artist file
python3 speedup/exporter.py --bz2 dumps/artist.xml.gz csv-dir
# to export all dump files
python3 speedup/exporter.py --bz2 dumps/*.xml.gz csv-dir
If that's ok for you I should be able to make it work that way.
Is the output shown correctly if pv
is not installed on the machine? pv
is intended to be used to provide realtime progress info so it could cause issues when the output is written to a log file instead of shown on screen in real time. We might want a switch to turn it on instead of using it if it is installed. I don't have time right now to look into this but the problem could be caused by pv writing to STDERR and nohup only logging STDOUT (or the inverse) or pv using escape characters.
I will try to find some time in the following days to make these changes, fix the readme file, and look into the pv
output issue.
Done as works and is a good improvement, the minor issues I have pointed out can be done in another pull request.
This PR adds support for importing the discogs dumps into MySQL.
To do this I first had to refactor the code a bit to decouple csv exporting code from postgres importing code. PostgreSQL specific code was moved into a /postgresql directory. This makes it easier to add support for other databases in the future.
The new code for importing the csv's into mysql can be found in /mysql.
I also updated the README to reflect these changes.