metabrainz / musicbrainz-docker

Docker Compose project for the MusicBrainz Server with replication, search, and development setup
https://musicbrainz.org/doc/MusicBrainz_Server/Setup
297 stars 75 forks source link

Documentation request: How to handle duplication during replication? #261

Closed ashprice closed 11 months ago

ashprice commented 11 months ago

Apologies if this is the wrong place to ask about this - I suppose it's not related to the docker image per se. However, I do believe that including information about this in the documentation, or at the very least a link to somewhere that does document it, would be helpful.

While running the replication script I ran into the following errors:

        'INSERT INTO musicbrainz.tag (id, name, ref_count) VALUES (?, ?, ?)'
        (259975 girl group 0)
23505 DBD::Pg::st execute failed: ERROR:  duplicate key value violates unique constraint "tag_idx_name"
DETAIL:  Key (name)=(girl group) already exists. [for Statement "INSERT INTO musicbrainz.tag (id, name, ref_count) VALUES (?, ?, ?)" with ParamValues: 1='259975', 2='girl group', 3='0']
 at /musicbrainz-server/admin/replication/../../lib/Sql.pm line 123.
        Sql::catch {...} (MusicBrainz::Server::Exceptions::DatabaseError=HASH(0x563205df3d18)) called at /root/perl5/lib/perl5/Try/Tiny.pm line 123
        Try::Tiny::try(CODE(0x563205df3460), Try::Tiny::Catch=REF(0x563205d3f760)) called at /musicbrainz-server/admin/replication/../../lib/Sql.pm line 124
        Sql::do(Sql=HASH(0x5632058b3400), "INSERT INTO musicbrainz.tag (id, name, ref_count) VALUES (?, "..., 259975, "girl group", 0) called at /musicbrainz-server/admin/replication/ProcessReplicationChanges line 515
        main::dbmirror2_insert(Sql=HASH(0x5632058b3400), "musicbrainz.tag", HASH(0x563205d3f3a0)) called at /musicbrainz-server/admin/replication/ProcessReplicationChanges line 336
        main::dbmirror2_command(Sql=HASH(0x5632058b3400), ARRAY(0x5632057cd4d8)) called at /musicbrainz-server/admin/replication/ProcessReplicationChanges line 185
Fri Oct 27 07:32:53 2023 : Continuing a previously aborted load
Fri Oct 27 07:32:53 2023 : Processing replication changes
     XIDs     Stmts est%  XIDs/sec  Stmt/sec
        0         0   0%         0         0Failed query:
        'INSERT INTO musicbrainz.tag (id, name, ref_count) VALUES (?, ?, ?)'
        (259975 girl group 0)
23505 DBD::Pg::st execute failed: ERROR:  duplicate key value violates unique constraint "tag_idx_name"
DETAIL:  Key (name)=(girl group) already exists. [for Statement "INSERT INTO musicbrainz.tag (id, name, ref_count) VALUES (?, ?, ?)" with ParamValues: 1='259975', 2='girl group', 3='0']
 at /musicbrainz-server/admin/replication/../../lib/Sql.pm line 123.
        Sql::catch {...} (MusicBrainz::Server::Exceptions::DatabaseError=HASH(0x55a3243f4170)) called at /root/perl5/lib/perl5/Try/Tiny.pm line 123
        Try::Tiny::try(CODE(0x55a324494500), Try::Tiny::Catch=REF(0x55a3244013e0)) called at /musicbrainz-server/admin/replication/../../lib/Sql.pm line 124
        Sql::do(Sql=HASH(0x55a323f8f2e0), "INSERT INTO musicbrainz.tag (id, name, ref_count) VALUES (?, "..., 259975, "girl group", 0) called at /musicbrainz-server/admin/replication/ProcessReplicationChanges line 515
        main::dbmirror2_insert(Sql=HASH(0x55a323f8f2e0), "musicbrainz.tag", HASH(0x55a324401020)) called at /musicbrainz-server/admin/replication/ProcessReplicationChanges line 336
        main::dbmirror2_command(Sql=HASH(0x55a323f8f2e0), ARRAY(0x55a323e83c30)) called at /musicbrainz-server/admin/replication/ProcessReplicationChanges line 185

I'm not sure how to deal with this. I have tried:

  1. rerunning the script
  2. recreating the services (--force-recreate)
  3. recreating the database (a full rebuild as outlined in the docs here)

What I haven't tried is manually deleting and so on the indexes, and then running

sudo docker-compose exec indexer python -m sir reindex --entity-type <foo>

which I am reluctant to do, I suppose, because this process takes a significant amount of time. I'm aware I can check and delete individual tables, but here I have another problem.

I see from previous issues that it's recommended to reindex when any of the indexes do not match the database in the number of records, except for annotations. However, when running the delete command and re-indexing, there is (on my environment at least) typically (a) no indication that the indexes were actually deleted (running check-search-indexes after deleting reprints the same numbers, instead of 0 for the index), and (b) after running the indexer, typically the numbers still do not match up.

I don't know if that is in some way expected behaviour, or if I actually have two issues. I should note that I had this same problem(?) with check-search-indexes on my environment across multiple previous installs (ie. completely wiping the docker environment and starting from scratch), but I didn't open an issue before because despite the mismatch reported by check-search-indexes, everything seemed to work.

Please let me know what other information I can provide, or if you'd rather I ask this somewhere else. And thank you for doing what you do!


$ git describe --always --broken --dirty --tags
v-2023-10-24-hotfix
$ sudo docker-compose version --short
2.22.0
$ sudo docker version -f 'Docker Client/Server: {{.Client.Version}}/{{.Server.Version}}'
Docker Client/Server: 24.0.6/24.0.6
ashprice commented 11 months ago

Actually, the issue mentioned with check-search-indexes no longer seems to be the case today. I noticed there can be a significant time-delay between the value being updated, so maybe I was just wrong about that.

Given that, I am going to try re-indexing. I don't think that is necessarily relevant for the duplication issue anyway, as it seems to be erroring out when trying to add the record to the DB, rather than the indexes.

reosarevok commented 11 months ago

Hi! This is not something you usually would need to deal with, which is why there's no documentation for it :) We had a one-off issue that caused it, and you can see the steps you might need to take to correct it in https://blog.metabrainz.org/2023/10/24/musicbrainz-server-mirror-only-fix-update-2023-10-24/