openzim / node-libzim

Libzim binding for Node.js: read/write ZIM files in Javascript
https://www.npmjs.com/package/@openzim/libzim
GNU General Public License v3.0
27 stars 11 forks source link

Port node-libzim to N-API #35

Closed kelvinhammond closed 4 years ago

kelvinhammond commented 4 years ago

Ports node-libzim to N-API.

Almost completely rewritten.

kelvinhammond commented 4 years ago

I don't have the test memleak script. Can someone add it here and/or test it too.

kelson42 commented 4 years ago

@kelvinhammond Hi, this is really awesome. With which version of nodejs have you tested? What is the memleak script you are talking about? Does the node-libzim API has changed at all? Does mwoffliner still works fine?

kelvinhammond commented 4 years ago

@kelson42

@kelvinhammond Hi, this is really awesome.

Thank you.

With which version of nodejs have you tested?

I'm using node v13.12.0 locally.

What is the memleak script you are talking about?

There was a reference to it here https://github.com/openzim/node-libzim/pull/35/files#diff-b9cfc7f2cdf78a7f4b91a753d10865a2L18 in the package.js "test-mem-leak": "tsc && node dist/makeLargeZim.js"

Does the node-libzim API has changed at all?

The api should be the same, I wanted this to be a drop in replacement.

Does mwoffliner still works fine?

I haven't tested this with mwoffliner, how would I go about doing that? I'll try to test that, what is a good site that I can test it against?

kelson42 commented 4 years ago

@kelvinhammond nodejs-libzim is used in https://github.com/openzim/mwoffliner and https://github.com/openzim/phet. You can see how they are called at https://farm.openzim.org/, our solution to build automatically ZIM files. But we would like to build other scrapers using nodejs-libzim, like one for https://libretexts.org/ (https://github.com/openzim/zim-requests/issues/131).

I will have a look to the memory leak detection script, but I guess I can not really help. I suspect @ISNIT0 the author never have commited it.

Just out of curiosity, how did you came to rewrite nodejs-libzim? Do you use it for an other project?

kelson42 commented 4 years ago

@kelvinhammond I have transmitted the review of your ticket to @midik Which is a typescript expert and far more skilled than me to do this review. With his help and your help, the goal is to get that PR merge and then make a new official release on npmjs.org.

kelvinhammond commented 4 years ago

I ported it because I wanted to explore the zim files a bit and possibly index a bunch of zim files with one index. I tried the nbind version and couldn't get it working with the current nodejs because nbind is outdated.

kelson42 commented 4 years ago

I ported it because I wanted to explore the zim files a bit and possibly index a bunch of zim files with one index.

Not sure exactly for what you need that or if you talk about the Xapian index, but our approach to do that is to use the ability of Xapian to deal with many indexes and gather the results. We call it the multizim support and we are at the really beginning of its implementation in the libkiwix.

midik commented 4 years ago

@kelvinhammond Thank you for your contribution. I just made some minor changes in TS part. @kelson42 Regarding to memory consuming, I tested on a large ZIMs with 1 billion text articles on node 13.5, max memory consuming didn't exceed 1-1.2 Gb.

kelson42 commented 4 years ago

@midik Thank you for your review @mgautierfr I would like to ask you a review on the C++ part please @kelvinhammond Your PR seems to works fine. We will have a look to the C++ code and if OK, we would then go for a new release. Thank you for your patience.

kelson42 commented 4 years ago

@kelvinhammond We just have released a new version of libzim which is more performant http://download.openzim.org/release/libzim/libzim_linux-x86_64-6.1.0.tar.gz. Would you be able be to use it in place of the older one?

kelvinhammond commented 4 years ago

@kelson42 done.

kelvinhammond commented 4 years ago

@kelson42 CodeFactor is failing because its outdated, how do I get it to retry?

kelson42 commented 4 years ago

@kelvinhammond I believe I have forced codeFactor to run again ainst this PR. I see here https://www.codefactor.io/repository/github/openzim/node-libzim/pulls that last run is a few minutes ago. In general, each new commit should force a rescan... Why to you believe this is not the case?

kelson42 commented 4 years ago

I think, the best is to make things in two steps:

@kelvinhammond Does that make sense to you? Would you be able to look to the two specific comment left by @mgautierfr please? Regarding CodeFactor, I have no clue what is going one, I would then just ignore it.

kelvinhammond commented 4 years ago

@kelson42 yes, I'll update the PR addressing the comments. CodeFactor I'll try to fix, hopefully it fixes itself when I push, if that fails I'll just remove the CPPLint.cfg I added.

kelson42 commented 4 years ago

@kelvinhammond New release from libzim https://download.openzim.org/release/libzim/libzim_linux-x86_64-6.1.1.tar.gz (we have detected a regression in previous one. No new one is foreseen in the next weeks).

kelvinhammond commented 4 years ago

@kelson42 I've bumped the libzim version, I'll get the code review changes in the near future.

kelson42 commented 4 years ago

@kelvinhammond Any chance you would be able to polish the last small problems of your PR in the next days? I'm really eager to merge it and make a new release.

kelvinhammond commented 4 years ago

@kelson42 I'll try to finish it up today, sorry for the delay.

kelvinhammond commented 4 years ago

@kelson42 I've update this PR with the changes for this PR. fixing codefactor now.

kelson42 commented 4 years ago

@kelvinhammond Thank you very much, please re-ask for review when your work is over.