openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
42 stars 3 forks source link

Make Gentoo wiki zim #7

Open kelson42 opened 6 years ago

kelson42 commented 6 years ago

From @Popolechien on August 27, 2018 7:14

https://wiki.gentoo.org/wiki/Main_Page

Licensed under CC-by-SA 3.0 (request from OTRS)

Copied from original issue: openzim/mwoffliner#365

kelson42 commented 6 years ago

We have it already http://library.kiwix.org/installgentoo_en_all_2018-07/

kelson42 commented 6 years ago

From @Popolechien on September 3, 2018 6:43

It's a different one apparently. Says user: "It's a different wiki, installgentoo is a wiki that covers almost all GNU/Linux distributions that's based off of the "Install Gentoo" meme.

The Gentoo wiki (https://wiki.gentoo.org/wiki/Main_Page), is exclusively for Gentoo and is essential to installing Gentoo."

kelson42 commented 6 years ago

@ISNIT0 Fails like following

mwoffliner --mwUrl="https://wiki.gentoo.org/" --mwApiPath="/api.php" --adminEmail="kelson@kiwix.com" --localParsoid --verbose

...

Getting redirects for article Overlay:Fkmclane...
Downloading https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Overlay%3AFkmclane&rawcontinue=...
Absolutely unable to retrieve async. URL: Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Knowledge_Base%3AObject_libsandbox.so_from_LD_PRELOAD_cannot_be_preloaded&rawcontinue= (response code: 503).
Getting redirects for article GLEP:2...
Downloading https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=GLEP%3A2&rawcontinue=...
Absolutely unable to retrieve async. URL: Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Knowledge_Base%3ACron_fails_to_load_in_root_crontab_with_message_ENTRYPOINT_FAILED&rawcontinue= (response code: 503).
Getting redirects for article GLEP:1...
Downloading https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=GLEP%3A1&rawcontinue=...
Absolutely unable to retrieve async. URL: Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Knowledge_Base%3AChrooting_returns_exec_format_error&rawcontinue= (response code: 503).
Getting redirects for article GLEP:48...
Downloading https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=GLEP%3A48&rawcontinue=...
Absolutely unable to retrieve async. URL: Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Knowledge_Base%3AOverriding_environment_variables_per_package&rawcontinue= (response code: 503).
Getting redirects for article GLEP:4...
Downloading https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=GLEP%3A4&rawcontinue=...
Absolutely unable to retrieve async. URL: Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Knowledge_Base%3ANo_space_left_on_device_while_there_is_plenty_of_space_available&rawcontinue= (response code: 503).
Getting redirects for article GLEP:39...
Downloading https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=GLEP%3A39&rawcontinue=...
Absolutely unable to retrieve async. URL: Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Knowledge_Base%3AInserting_base_module_in_module_store_fails_with_duplicate_declaration&rawcontinue= (response code: 503).
Getting redirects for article GLEP:3...
Downloading https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=GLEP%3A3&rawcontinue=...
Absolutely unable to retrieve async. URL: Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Knowledge_Base%3APortage_fails_to_label_files_because_setfiles_does_not_work_anymore&rawcontinue= (response code: 503).
Getting redirects for article GLEP:5...
Downloading https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=GLEP%3A5&rawcontinue=...
Absolutely unable to retrieve async. URL: Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Knowledge_Base%3AIs_swap_space_really_necessary&rawcontinue= (response code: 503).
Getting redirects for article GLEP:6...
Downloading https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=GLEP%3A6&rawcontinue=...
Unable to download content [1] https://wiki.gentoo.org//api.php?action=query&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&colimit=max&prop=revisions|coordinates&gapnamespace=0&format=json&rawcontinue=&gapcontinue=GNU_Emacs (response code: 503).
Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&colimit=max&prop=revisions|coordinates&gapnamespace=510&format=json&rawcontinue=&gapcontinue=Portage%2FMembership (response code: 503).
Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&titles=Knowledge_Base%3AAll_available_memory_is_being_used&rawcontinue= (response code: 503).
Absolutely unable to retrieve async. URL: Unable to download content [3] https://wiki.gentoo.org//api.php?action=query&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&colimit=max&prop=revisions|coordinates&gapnamespace=510&format=json&rawcontinue=&gapcontinue=Portage%2FMembership (response code: 503).
Unable to download article ids: Error by retrieving https://wiki.gentoo.org//api.php?action=query&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&colimit=max&prop=revisions|coordinates&gapnamespace=510&format=json&rawcontinue=&gapcontinue=Portage%2FMembership Error by retrieving https://wiki.gentoo.org//api.php?action=query&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&colimit=max&prop=revisions|coordinates&gapnamespace=510&format=json&rawcontinue=&gapcontinue=Portage%2FMembership
kelson42 commented 6 years ago

From @ISNIT0 on September 18, 2018 9:22

The server is returning 503s. The urls themselves work, but it seems like the server is trying to avoid being scraped.

Thoughts? The response form the site is empty and a valid 503

kelson42 commented 4 years ago

I have moved the "gentoo" recipe which was scraping installgentoo.com to "installgentoo". This recipe does not work anymore because installgentoo has stop to provide its API. See https://farm.openzim.org/recipes/installgentoo/

I have created the recipe "gentoo" https://farm.openzim.org/recipes/gentoo for this request.

kelson42 commented 3 years ago

Done

vitaly-zdanevich commented 5 months ago

Will it be updated to the current state?

vitaly-zdanevich commented 5 months ago

I have created the recipe "gentoo" https://farm.openzim.org/recipes/gentoo for this request.

image

vitaly-zdanevich commented 4 months ago

@benoit74

benoit74 commented 4 months ago

Reopening, since ZIM is still not yet available. Should probably be tried again now that years have passed once mwoffliner 1.14 is out (in the coming weeks)