Closed robkam closed 1 year ago
$ dumpgenerator --xml --images --user USER --pass PASSWORD --api https://scruffy.miraheze.org/w/api.php --index https://scruffy.miraheze.org/wiki/index.php
Checking API... https://scruffy.miraheze.org/w/api.php
MediaWiki API seems to work but returned no index URL
API is OK: https://scruffy.miraheze.org/w/api.php
Trying to log in to the wiki using clientLogin... (MW 1.27+)
client login: Success! Welcome, Xyzzy!
-- Login OK --
Checking index.php... https://scruffy.miraheze.org/w/index.php
ERROR: This wiki requires login and we are not authenticated
Error in index.php.
Please, provide a correct path to index.php or use --xmlrevisions. Terminating.
or
$ dumpgenerator --xml --xmlrevisions --images --user USER --pass PASSWORD --api https://scruffy.miraheze.org/w/api.php --index https://scruffy.miraheze.org/wiki/index.php
Checking API... https://scruffy.miraheze.org/w/api.php
MediaWiki API seems to work but returned no index URL
API is OK: https://scruffy.miraheze.org/w/api.php
Trying to log in to the wiki using clientLogin... (MW 1.27+)
client login: Success! Welcome, Xyzzy!
-- Login OK --
Checking index.php... https://scruffy.miraheze.org/wiki/index.php
ERROR: This wiki requires login and we are not authenticated
Error in index.php.
No --path argument provided. Defaulting to:
[working_directory]/[domain_prefix]-[date]-wikidump
Which expands to:
./scruffymirahezeorg_w-20230215-wikidump
--delay is the default value of 0.5
There will be a 0.5 second delay between HTTP calls in order to keep the server from timing you out.
If you know that this is unnecessary, you can manually specify '--delay 0.0'.
#########################################################################
# Welcome to DumpGenerator 0.4.0-alpha by WikiTeam (GPL v3) #
# More info at: https://github.com/elsiehupp/wikiteam3 #
#########################################################################
#########################################################################
# Copyright (C) 2011-2023 WikiTeam developers #
# #
# This program is free software: you can redistribute it and/or modify #
# it under the terms of the GNU General Public License as published by #
# the Free Software Foundation, either version 3 of the License, or #
# (at your option) any later version. #
# #
# This program is distributed in the hope that it will be useful, #
# but WITHOUT ANY WARRANTY; without even the implied warranty of #
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #
# GNU General Public License for more details. #
# #
# You should have received a copy of the GNU General Public License #
# along with this program. If not, see <http://www.gnu.org/licenses/>. #
#########################################################################
Analysing https://scruffy.miraheze.org/w/api.php
Trying generating a new dump into a new directory...
https://scruffy.miraheze.org/w/api.php
Getting the XML header from the API
Export test via the API failed. Wiki too old? Trying without xmlrevisions.
https://scruffy.miraheze.org/w/api.php
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Python\Scripts\dumpgenerator.exe\__main__.py", line 7, in <module>
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\__init__.py", line 26, in main
DumpGenerator()
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\generator.py", line 115, in __init__
DumpGenerator.createNewDump(config=config, other=other)
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\generator.py", line 128, in createNewDump
generateXMLDump(config=config, session=other["session"])
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_dump.py", line 96, in generateXMLDump
header, config = getXMLHeader(config=config, session=session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_header.py", line 124, in getXMLHeader
header, config = getXMLHeader(config=config, session=session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_header.py", line 70, in getXMLHeader
[
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_header.py", line 70, in <listcomp>
[
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\page\xmlexport\page_xml_export.py", line 117, in getXMLPageWithExport
xml = getXMLPageCore(params=params, config=config, session=session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\page\xmlexport\page_xml_export.py", line 76, in getXMLPageCore
r = session.post(
^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\requests\sessions.py", line 635, in post
return self.request("POST", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\wikiteam3\utils\user_agent.py", line 324, in newrequest
return session._orirequest(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\requests\sessions.py", line 573, in request
prep = self.prepare_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\requests\sessions.py", line 484, in prepare_request
p.prepare(
File "C:\Python\Lib\site-packages\requests\models.py", line 368, in prepare
self.prepare_url(url, params)
File "C:\Python\Lib\site-packages\requests\models.py", line 439, in prepare_url
raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL 'None': No scheme supplied. Perhaps you meant https://None?
Still fails.
This might be something else. The wiki was been dumped okay on the 27th Jan, by turning off the privacy, now doing that I get:
$ dumpgenerator --xml --xmlrevisions --images --api https://scruffy.miraheze.org/w/api.php
\<snipped>
Trying to export all revisions from namespace -1
Trying to get wikitext from the allrevisions API and to build the XML
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Python\Scripts\dumpgenerator.exe\__main__.py", line 7, in <module>
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\__init__.py", line 26, in main
DumpGenerator()
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\generator.py", line 115, in __init__
DumpGenerator.createNewDump(config=config, other=other)
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\generator.py", line 128, in createNewDump
generateXMLDump(config=config, session=other["session"])
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_dump.py", line 137, in generateXMLDump
doXMLRevisionDump(config, session, xmlfile, lastPage, useAllrevisions=True)
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_dump.py", line 25, in doXMLRevisionDump
for xml in getXMLRevisions(config=config, session=session, lastPage=lastPage, useAllrevision=useAllrevisions):
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\page\xmlrev\xml_revisions.py", line 67, in getXMLRevisionsByAllRevisions
arvrequest = site.api(
^^^^^^^^^
File "C:\Python\Lib\site-packages\mwclient\client.py", line 288, in api
if self.handle_api_result(info, sleeper=sleeper):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\mwclient\client.py", line 331, in handle_api_result
raise errors.APIError(info['error']['code'],
mwclient.errors.APIError: ('readapidenied', 'You need read permission to use this module.', 'See https://scruffy.miraheze.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes.')
Can you provide the error message of the login process ?
miraheze.org blocked my IP. :-(
The error messages are above. By the way on this wiki main page is a redirect to the sandbox.
The wiki is MediaWiki 1.39.1
$ dumpgenerator --xml --xmlrevisions https://scruffy.miraheze.org --user USER --pass PASSWORD
Checking API... https://scruffy.miraheze.org/w/api.php
MediaWiki API seems to work but returned no index URL
API is OK: https://scruffy.miraheze.org/w/api.php
Trying to log in to the wiki using clientLogin... (MW 1.27+)
client login: Success! Welcome, Xyzzy!
-- Login OK --
Checking index.php... https://scruffy.miraheze.org/w/index.php
ERROR: This wiki requires login and we are not authenticated
Error in index.php.
No --path argument provided. Defaulting to:
[working_directory]/[domain_prefix]-[date]-wikidump
Which expands to:
./scruffymirahezeorg_w-20230216-wikidump
--delay is the default value of 0.5
There will be a 0.5 second delay between HTTP calls in order to keep the server from timing you out.
If you know that this is unnecessary, you can manually specify '--delay 0.0'.
#########################################################################
# Welcome to DumpGenerator 0.4.0-alpha by WikiTeam (GPL v3) #
# More info at: https://github.com/elsiehupp/wikiteam3 #
#########################################################################
#########################################################################
# Copyright (C) 2011-2023 WikiTeam developers #
# #
# This program is free software: you can redistribute it and/or modify #
# it under the terms of the GNU General Public License as published by #
# the Free Software Foundation, either version 3 of the License, or #
# (at your option) any later version. #
# #
# This program is distributed in the hope that it will be useful, #
# but WITHOUT ANY WARRANTY; without even the implied warranty of #
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #
# GNU General Public License for more details. #
# #
# You should have received a copy of the GNU General Public License #
# along with this program. If not, see <http://www.gnu.org/licenses/>. #
#########################################################################
Analysing https://scruffy.miraheze.org/w/api.php
Trying generating a new dump into a new directory...
https://scruffy.miraheze.org/w/api.php
Getting the XML header from the API
Export test via the API failed. Wiki too old? Trying without xmlrevisions.
https://scruffy.miraheze.org/w/api.php
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Python\Scripts\dumpgenerator.exe\__main__.py", line 7, in <module>
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\__init__.py", line 26, in main
DumpGenerator()
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\generator.py", line 115, in __init__
DumpGenerator.createNewDump(config=config, other=other)
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\generator.py", line 128, in createNewDump
generateXMLDump(config=config, session=other["session"])
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_dump.py", line 96, in generateXMLDump
header, config = getXMLHeader(config=config, session=session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_header.py", line 124, in getXMLHeader
header, config = getXMLHeader(config=config, session=session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_header.py", line 70, in getXMLHeader
[
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\xmldump\xml_header.py", line 70, in <listcomp>
[
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\page\xmlexport\page_xml_export.py", line 117, in getXMLPageWithExport
xml = getXMLPageCore(params=params, config=config, session=session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\wikiteam3\dumpgenerator\dump\page\xmlexport\page_xml_export.py", line 76, in getXMLPageCore
r = session.post(
^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\requests\sessions.py", line 635, in post
return self.request("POST", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\wikiteam3\utils\user_agent.py", line 324, in newrequest
return session._orirequest(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\requests\sessions.py", line 573, in request
prep = self.prepare_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python\Lib\site-packages\requests\sessions.py", line 484, in prepare_request
p.prepare(
File "C:\Python\Lib\site-packages\requests\models.py", line 368, in prepare
self.prepare_url(url, params)
File "C:\Python\Lib\site-packages\requests\models.py", line 439, in prepare_url
raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL 'None': No scheme supplied. Perhaps you meant https://None?
I've put in a request at Miraheze T10511 for your IP to be unblocked.
The answer I get is "... this task would be for Stewards and not for SRE/Phabricator. And otherwise, without knowing the specific IP (it can be sent via email) we are unable to assist." The email address is stewards(at)miraheze.org
After I'd was logged in to the same private wiki as above, I then used the same username and password with MediaWiki Scraper. It authenticated and dumped the wiki okay. Also after I'd logged out of the wiki and tried again with MediaWiki Scraper, it still authenticates and dumps the wiki.
Either the problem has been fixed or instructions to first login need to be added to the usage.
$ dumpgenerator --xml --xmlrevisions --api https://scruffy.miraheze.org/w/api.php --user USER --pass PASSWORD
Checking API... https://scruffy.miraheze.org/w/api.php
MediaWiki API seems to work but returned no index URL
API is OK: https://scruffy.miraheze.org/w/api.php
Trying to log in to the wiki using clientLogin... (MW 1.27+)
client login: Success! Welcome, USER!
-- Login OK --
Checking index.php... https://scruffy.miraheze.org/w/index.php
index.php is OK
No --path argument provided. Defaulting to:
[working_directory]/[domain_prefix]-[date]-wikidump
Which expands to:
./scruffymirahezeorg_w-20230826-wikidump
--delay is the default value of 0.5
There will be a 0.5 second delay between HTTP calls in order to keep the server from timing you out.
If you know that this is unnecessary, you can manually specify '--delay 0.0'.
#########################################################################
# Welcome to DumpGenerator 0.4.0-alpha by WikiTeam (GPL v3) #
# More info at: https://github.com/elsiehupp/wikiteam3 #
#########################################################################
[snipped the rest]
When the login used has at least read permission on the wiki the script will authenticate.
Windows 10, Git Bash, Python 3.11.1