openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
37 stars 2 forks source link

New ZIM: Mankier.com #171

Open trappedinspacetime opened 5 years ago

trappedinspacetime commented 5 years ago

Please use the following format for a ZIM creation request…

I am sorry I don't know if it's possible, mankier.com is a need for developers.

kelson42 commented 4 years ago

This is possible and a good idea.

RavanJAltaie commented 1 year ago

Requested https://farm.openzim.org/recipes/mankier

RavanJAltaie commented 1 year ago

Succeeded.

trappedinspacetime commented 1 year ago

@RavanJAltaie Thank you for your effort and the info. I checked out that ManKier_2022-12.zim. Unfortunately it's only 300KB file and it's not working.

Popolechien commented 1 year ago

Yeah I confirm it only grabbed the first page: https://dev.library.kiwix.org/viewer#mankier_2022-12/A/www.mankier.com/

benoit74 commented 2 months ago

This cannot work with Zimit, the website relies on a web API. I would tag this as "Scraper needed" at least, or decide we will never ZIM this (but the need since makes sense, so we should find an alternative).

I have some doubts regarding Licensing given the fact that code seems to be closed-source.

Popolechien commented 2 months ago

I've pinged the website owner to ask for clarification.

Popolechien commented 2 months ago

We got permission (see https://kiwix.freshdesk.com/a/tickets/70652). Anything they could do to help?

benoit74 commented 2 months ago

Super cool!

It is unfortunately not possible to use Zimit scraper because we do not have the ability to scrape the database and API service which are returning responses to search requests about a man page.

So I'm certain they can help if they want to. At least we can ask them how they would recommend to create an offline version of their website.

Would they be open to share the database with us so that we can write a custom scraper on-top of this database? Would they be open to share the source code of their website (rendering engine seems to be open-sourced, but not the rest of the website) so that can leverage this to build the scraper more quickly? Would they be open to contribute to this custom scraper effort: they can maybe easily adapt their website to become a "static-website" version which is not using any API or database, just plain (JSON) files, so that we can quickly create the scraper on-top of this static website?

Details could be discussed in a live meeting if they have interest in such a project and/or directly in this issue.

benoit74 commented 2 months ago

Hi Benoit,

There is an API and an underlying DB, used for the search and by some third parties... my assumption was you can ignore this if the goal is to package the content of the man pages which is static HTML, and exclude the search input box in Kiwix.

To get a list of all the pages I would suggest starting in the sections as I mentioned below. You can see how many pages there are per section: https://www.mankier.com/stats

Cheers, Jackson

Recipe reconfigured (I also altered a bit the title and description for more precision) and requested the task: https://farm.openzim.org/pipeline/d31651c5-0ffe-4492-a04b-3298a4c39980

benoit74 commented 2 months ago

Nota: excluding the search box is not straightforward with custom CSS, at least I failed to find proper CSS selector, let's live with it for a first version, we can fix that later if first ZIM is mostly OK

benoit74 commented 2 months ago

ZIM is ready and mostly OK: https://dev.library.kiwix.org/viewer#www.mankier.com_en_all_2024-06

There is just one big problem on https://dev.library.kiwix.org/viewer#www.mankier.com_en_all_2024-06/www.mankier.com/ page which is completely broken, I'll open an upstream issue

Popolechien commented 2 months ago

Nice. I couldn't find the problematic page you mentioned. How does one get there?

benoit74 commented 2 months ago

Click on the "Home" link