mediawiki-client-tools / mediawiki-dump-generator

Python 3 tools for downloading and preserving wikis
https://github.com/mediawiki-client-tools/mediawiki-scraper
GNU General Public License v3.0
92 stars 14 forks source link

Potential Issue with Creating New Items After Importing a MediaWiki Dump Generator Dump #249

Open Superraptor opened 3 months ago

Superraptor commented 3 months ago

Describe the Bug

I am following this guide to import a Wikibase Dump Generator XML dump from a Wikibase.Cloud instance. I have also gone in-depth with code, outputs, and attempted solutions here.

Expected Behavior

After importing the dump, I should be able to create new items, properties, or lexemes after updating the tables as posted in this guide.

Actual Behavior

When trying to add with a bot I get the WikibaseIntegrator error wikibaseintegrator.wbi_exceptions.MWApiError: 'The save has failed.'. When trying to add something via the GUI, I get the error Could not create a new page. It already exists..

Command for Reproducing the Bug

I follow everything represented here, using a MediaWiki Dump Generator XML dump.

Output

I get no other errors other than what is mentioned in the "Actual Behavior" section.

Platform Details

I'm using the Wikibase Docker installation on Windows.

Additional Context

Additional info, including scripts, available here. I'm not positive that this is a MediaWiki Dump Generator issue, but I've eliminated a lot of other potential options. Thank you so much for your help and support, and please let me know if I can provide additional information or context.

yzqzss commented 3 months ago

Try using this xmldump to re-import

https://archive.org/download/wiki-lgbtdb.wikibase.cloud_w-20240604/lgbtdb.wikibase.cloud_w-20240604-history.xml.zst (from https://archive.org/details/wiki-lgbtdb.wikibase.cloud_w-20240604)

curl -L https://archive.org/download/wiki-lgbtdb.wikibase.cloud_w-20240604/lgbtdb.wikibase.cloud_w-20240604-history.xml.zst | zstd -d --long=31 | <your import script>

(please clean/re-deploy the destination wiki before importing)

Superraptor commented 3 months ago

@yzqzss thanks so much for the comment-- just to make sure here, what should go in the area? Should it just be something like winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "curl -L https://archive.org/download/wiki-lgbtdb.wikibase.cloud_w-20240604/lgbtdb.wikibase.cloud_w-20240604-history.xml.zst | zstd -d --long=31 | php /var/www/html/maintenance/importDump.php" or should it be curl -L https://archive.org/download/wiki-lgbtdb.wikibase.cloud_w-20240604/lgbtdb.wikibase.cloud_w-20240604-history.xml.zst | zstd -d --long=31 | bash sample_bash_script.sh?

Superraptor commented 3 months ago

Testing it as follows now; first I ran: curl -L https://archive.org/download/wiki-lgbtdb.wikibase.cloud_w-20240604/lgbtdb.wikibase.cloud_w-20240604-history.xml.zst | zstd -d --long=31 > lgbtdb.wikibase.cloud_w-20240604-history.xml. This created a file that was 313,578 KB (this makes sense as the dump I'm testing is from 20240613 and it is 325,882 KB). Then I ran my script as follows (after making sure to run docker-compose -f docker-compose.yml -f docker-compose.extra.yml down --volumes --remove-orphans to remove the previous installation):

#!/bin/sh

# Load config.ini
source config.ini

cd "$WBDOCKERPATH"

# Run docker-compose to set up Wikibase Suite instance.
docker-compose -f docker-compose.yml -f docker-compose.extra.yml up -d

# Run update and install vim.
winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "apt-get -y update && apt-get -y install vim && apt-get -y install python3 && apt-get -y install python3-pip"

# Update for pip.
winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "python3 -m pip config --global set global.break-system-packages true"

# Clone WikibaseLexeme, making sure to use the REL1_41 branch.
if [ ! -d "$WBWRAPPERPATH/WikibaseLexeme" ]; then
    git clone -b REL1_41 https://gerrit.wikimedia.org/r/p/mediawiki/extensions/WikibaseLexeme.git "$WBWRAPPERPATH/WikibaseLexeme"
fi
docker cp "$WBWRAPPERPATH/WikibaseLexeme" wbdocker-wikibase-1:/var/www/html/extensions/WikibaseLexeme

# Load WikibaseLexeme.
if winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "grep -Fxq \"wfLoadExtension( 'WikibaseLexeme' );\" /var/www/html/LocalSettings.php";
then
    :
else
    winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "echo \"wfLoadExtension( 'WikibaseLexeme' );\" >> /var/www/html/LocalSettings.php"
fi

# Run update script.
winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "php /var/www/html/maintenance/update.php --force"

# Copy over XML dump to upload.
docker cp "$WBWRAPPERPATH/lgbtdb.wikibase.cloud_w-20240604-history.xml" wbdocker-wikibase-1:/var/tmp/dump.xml

# Update LocalSettings.php to allow for entity import.
if winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "grep -Fxq \"\$wgWBRepoSettings['allowEntityImport'] = true;\" /var/www/html/LocalSettings.php";
then
    :
else
    winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "echo '$'\"wgWBRepoSettings['allowEntityImport'] = true;\" >> /var/www/html/LocalSettings.php"
fi

# Importing the dump.
winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c  "php /var/www/html/maintenance/importDump.php < /var/tmp/dump.xml"

# Rebuilding.
winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "php /var/www/html/maintenance/rebuildall.php"

# Running jobs.
winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "php /var/www/html/maintenance/runJobs.php --memory-limit 512M"

# Updating site stats.
winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "php /var/www/html/maintenance/initSiteStats.php --update"

# Copying over rebuild script.
if [ ! -f "$WBWRAPPERPATH/rebuildWikibaseIdCounters.sql" ]; then
    curl https://gist.githubusercontent.com/JeroenDeDauw/c86a5ab7e2771301eb506b246f1af7a6/raw/rebuildWikibaseIdCounters.sql -o "$WBWRAPPERPATH/rebuildWikibaseIdCounters.sql"
fi
docker cp "$WBWRAPPERPATH/rebuildWikibaseIdCounters.sql" wbdocker-wikibase-1:/var/www/html/maintenance/rebuildWikibaseIdCounters.sql

# Rebuilding.
winpty docker exec -it wbdocker-wikibase-1 //bin//bash -c "php /var/www/html/maintenance/sql.php /var/www/html/maintenance/rebuildWikibaseIdCounters.sql"

Will post results ASAP.

Superraptor commented 3 months ago

Unfortunately this did not appear to work-- still getting Could not create a new page. It already exists..