openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
296 stars 74 forks source link

Bug help #530

Closed shikulja closed 5 years ago

shikulja commented 5 years ago

Help to fix the error, I read about NODE_TLS_REJECT_UNAUTHORIZED, but I do not know where to add it to make it work

> mwoffliner  --mwUrl=https://es.wikipedia.org --adminEmail=foo@bar.net --verbose --format=zim
Getting text direction...
Downloading https://es.wikipedia.org/wiki/...
(node:89140) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
events.js:173
      throw er; // Unhandled 'error' event
      ^

Error: Redis connection to /dev/shm/redis.sock failed - connect ENOENT /dev/shm/redis.sock
    at PipeConnectWrap.afterConnect [as oncomplete] (net.js:1081:14)
Emitted 'error' event at:
    at RedisClient.on_error (/home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/node_modules/redis/index.js:406:14)
    at Socket.<anonymous> (/home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/node_modules/redis/index.js:279:14)
    at Socket.emit (events.js:188:13)
    at emitErrorNT (internal/streams/destroy.js:82:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
    at processTicksAndRejections (internal/process/next_tick.js:76:17)
ISNIT0 commented 5 years ago

Hi @shikulja the NODE_TLS_REJECT_UNAUTHORIZED is just a warning, and not the reason for mwoffliner to fail.

Your scrape is failing because MWOffliner can't connect to Redis. Do you have an instance running locally?

You can add a variant of this to your command: --redis=redis://127.0.0.1:6379

shikulja commented 5 years ago

Hi @shikulja the NODE_TLS_REJECT_UNAUTHORIZED is just a warning, and not the reason for mwoffliner to fail.

Your scrape is failing because MWOffliner can't connect to Redis. Do you have an instance running locally?

You can add a variant of this to your command: --redis=redis://127.0.0.1:6379

its work, thanks for the help

ISNIT0 commented 5 years ago

@shikulja No problem

shikulja commented 5 years ago

new error...

mwoffliner --mwUrl=https://dragonage.fandom.com/ru/ --adminEmail=foo@bar.net --verbose --format=zim --redis=redis://127.0.0.1:6379 Getting text direction... Downloading https://dragonage.fandom.com/ru/wiki/... (node:89381) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification. Text direction is ltr Getting web site name... Downloading https://dragonage.fandom.com/ru/w/api.php?action=query&meta=siteinfo&format=json&siprop=general|namespaces|statistics|variables|category|wikidesc... undefined:1 <!doctype html> ^

SyntaxError: Unexpected token < in JSON at position 0 at JSON.parse () at /home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/lib/MediaWiki.js:159:36 at /home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/lib/Downloader.js:199:13 at /home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:676:51 at /home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:726:13 at /home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:52:16 at /home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:264:21 at /home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:44:16 at /home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:723:17 at /home/shikulja/.nvm/versions/node/v11.7.0/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:167:37

ISNIT0 commented 5 years ago

@shikulja This is because you haven't specified the mwApiPath and the wiki doesn't use the default.

Try adding this option: --mwApiPath=/api.php

shikulja commented 5 years ago

@shikulja No problem

@shikulja This is because you haven't specified the mwApiPath and the wiki doesn't use the default.

Try adding this option: --mwApiPath=/api.php

work)))) Thanks, I'm already suffering for half a day, until I compiled all the dependencies added .. already the head did not work

ISNIT0 commented 5 years ago

@shikulja To be clear, is this now working completely for you?

shikulja commented 5 years ago

@shikulja To be clear, is this now working completely for you? Yes it works. Checked the tmp page, there is no background, but it may be worth waiting until the end, until it parsing completely. 23-01-2019 190954

shikulja commented 5 years ago

wowhead.com and him like can dump? they are quite complex, for example, when you hover the mouse there are tips

ISNIT0 commented 5 years ago

@shikulja Sorry, I don't understand what you mean? Can I close this ticket?

shikulja commented 5 years ago

@shikulja Sorry, I don't understand what you mean? Can I close this ticket?

no, don't close yet. wowhead.com can dump? or add support in the future

shikulja commented 5 years ago

===========1==============

mwoffliner --mwUrl=https://dragonage.fandom.com/ru --adminEmail=foo@bar.net --format=nozim --mwApiPath=/api.php --redis=redis://127.0.0.1:6379 --customZimFavicon=/home/shikulja/Documents/Forum_book.ico
undefined:1
<!doctype html>
^

SyntaxError: Unexpected token < in JSON at position 0
    at JSON.parse (<anonymous>)
    at /usr/lib/node_modules/mwoffliner/lib/MediaWiki.js:159:36
    at /usr/lib/node_modules/mwoffliner/lib/Downloader.js:199:13
    at /usr/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:676:51
    at /usr/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:726:13
    at /usr/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:52:16
    at /usr/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:264:21
    at /usr/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:44:16
    at /usr/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:723:17
    at /usr/lib/node_modules/mwoffliner/node_modules/async/lib/async.js:167:37

============2===============

mwoffliner --mwUrl=https://dragonage.fandom.com/ru/ --adminEmail=foo@bar.net --format=nozim --mwApiPath=api.php --redis=redis://127.0.0.1:6379 --customZimFavicon=/home/shikulja/Documents/Forum_book.ico

......
Executing command : pngquant --verbose --strip --nofs --force --ext=".0ne4o.png" "/home/shikulja/tmp/dragonage_ru_all_2019-01/favicon.png" &&          advdef -q -z -4 -i 5 "/home/shikulja/tmp/dragonage_ru_all_2019-01/favicon.0ne4o.png" &&          if [ $(stat -c%s "/home/shikulja/tmp/dragonage_ru_all_2019-01/favicon.0ne4o.png") -lt $(stat -c%s "/home/shikulja/tmp/dragonage_ru_all_2019-01/favicon.png") ]; then mv "/home/shikulja/tmp/dragonage_ru_all_2019-01/favicon.0ne4o.png" "/home/shikulja/tmp/dragonage_ru_all_2019-01/favicon.png"; else rm "/home/shikulja/tmp/dragonage_ru_all_2019-01/favicon.0ne4o.png"; fi
Getting [js] module [startup]
Getting [js] module [jquery]
Getting [js] module [mediawiki]
Getting [js] module [site]
Unable to determine the protocol of the following url (data:), switched back to https: data:image/gif;base64,R0lGODlhAQABAIABAAAAAP///yH5BAEAAAEALAAAAAABAAEAQAICTAEAOw%3D%3D
New url is: https:image/gif;base64,R0lGODlhAQABAIABAAAAAP///yH5BAEAAAEALAAAAAABAAEAQAICTAEAOw%3D%3D
Unable to download content [1] https:image/gif;base64,R0lGODlhAQABAIABAAAAAP///yH5BAEAAAEALAAAAAABAAEAQAICTAEAOw%3D%3D (request error: Error: connect ECONNREFUSED 127.0.0.1:443 ).
Unable to download content [2] https:image/gif;base64,R0lGODlhAQABAIABAAAAAP///yH5BAEAAAEALAAAAAABAAEAQAICTAEAOw%3D%3D (request error: Error: connect ECONNREFUSED 127.0.0.1:443 ).
Unable to download content [3] https:image/gif;base64,R0lGODlhAQABAIABAAAAAP///yH5BAEAAAEALAAAAAABAAEAQAICTAEAOw%3D%3D (request error: Error: connect ECONNREFUSED 127.0.0.1:443 ).
Absolutely unable to retrieve async. URL: Unable to download content [3] https:image/gif;base64,R0lGODlhAQABAIABAAAAAP///yH5BAEAAAEALAAAAAABAAEAQAICTAEAOw%3D%3D (request error: Error: connect ECONNREFUSED 127.0.0.1:443 ).

try reinstall update modules.. it seems to work ... it's some kind of magic .. after the restart of the system, the errors again become repeated.... upd. start again work..

request, to make it easier and more convenient to install .. it's just a nightmare with dependencies, some of which you have to compile manually .. or look for how to configure and install them separately.

new git pngquant is really much faster .. sorry for repositories so far no.

ISNIT0 commented 5 years ago

@shikulja I agree, it's difficult to install. We're working on it! šŸ‘

shikulja commented 5 years ago

I do not know how to fix it, I already did everything with redis 5.x.x repo

Jan 25 01:52:44 ubuntu systemd[1]: Starting Advanced key-value store...
Jan 25 01:52:44 ubuntu systemd[1]: redis-server.service: Can't open PID file /va
Jan 25 01:52:44 ubuntu systemd[1]: Started Advanced key-value store.

reinstall to 3.2.8 seems to work, npm offliner, had to be reset too, for some reason it stops working

@shikulja I agree, it's difficult to install. We're working on it! šŸ‘

shikulja commented 5 years ago

how fix this? Failed to run mwoffliner after [2715s]: { "killed": false, "code": 1, "signal": null, "cmd": "rm -rf \"/home/shikulja/tmp/dragonage_ru_all_2019-01/\""

cd ./Documents sudo mwoffliner -- -- --

get

Successfuly optimized /home/shikulja/Documents/tmp/dragonage_ru_all_2019-01/favicon.png
Create main page redirection...
Saving articles...
Failed to run mwoffliner after [63s]: {}
ISNIT0 commented 5 years ago

@shikulja What operating are you using?

shikulja commented 5 years ago

current ubuntu 18.10 and try 18.04

@ISNIT0

shikulja commented 5 years ago

Completely reinstalled on ubuntu 1904 redis 5.x.x nodejs 8.15 etc .. The first time I got a dump is normal .. except style = "background-color: white; and there were no images The second time I started, stopped working, only need change --adminEmail all time, and update nodejs+npm, mb give better dump..

shikulja commented 5 years ago

go to zimmer.. better dump 1gb vs 160mb (unpacked) with background color+images