openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
275 stars 72 forks source link

Use latest node-libzim #1576

Open kelson42 opened 2 years ago

kelson42 commented 2 years ago

… and bring all the improvments of libzim7, see changelog: https://github.com/openzim/libzim/blob/master/ChangeLog

This depends on https://github.com/openzim/node-libzim/issues/69

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 2 years ago

@kelvinhammond This is the ticket for using latest (still dev) version of node-libzim. Thank you for volunteering on this as well. I'm still not a typescript dev. but I'm far more familliar with this code base. Ping me anytime if you have a question, I should be able to help you. we can also have a preparatory chat on Slack if you want.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

audiodude commented 2 months ago

I don't think there was ever a 2.5.0 release of node-libzim (https://github.com/openzim/node-libzim/blob/main/Changelog). Do we want to jump directly to 3.2.0?

kelson42 commented 2 months ago

I don't think there was ever a 2.5.0 release of node-libzim (https://github.com/openzim/node-libzim/blob/main/Changelog). Do we want to jump directly to 3.2.0?

yes, actually we should make a new release... I'm not even sure we properly support macOS with ARM CPU.

kelson42 commented 2 months ago

Fixing this issue is very important because all the bugs which have been fixed in the libzim are still impacting our current Wikipedia ZIM files.

HiroyasuNishiyama commented 1 month ago

Hello,

I have understood that the reason for the broken search functionality in the Japanese Wikipedia is due to the outdated version of node-libzim used by mwoffliner.

After experimenting with several publicly available ZIM files, I recognized that by regenerating the ZIM files with the zimrecreate tool included in zim-tools, they become searchable.

If resolving the issue being discussed here is going to take a bit more time, would it be possible to apply zimrecreate as a post-processing step for mwoffliner before publishing the Japanese Wikipedia ZIM file and others as a temporary workaround?