Closed kelvinhammond closed 4 years ago
I don't have the test memleak script. Can someone add it here and/or test it too.
@kelvinhammond Hi, this is really awesome. With which version of nodejs have you tested? What is the memleak script you are talking about? Does the node-libzim API has changed at all? Does mwoffliner still works fine?
@kelson42
@kelvinhammond Hi, this is really awesome.
Thank you.
With which version of nodejs have you tested?
I'm using node v13.12.0 locally.
What is the memleak script you are talking about?
There was a reference to it here https://github.com/openzim/node-libzim/pull/35/files#diff-b9cfc7f2cdf78a7f4b91a753d10865a2L18 in the package.js
"test-mem-leak": "tsc && node dist/makeLargeZim.js"
Does the node-libzim API has changed at all?
The api should be the same, I wanted this to be a drop in replacement.
Does mwoffliner still works fine?
I haven't tested this with mwoffliner, how would I go about doing that? I'll try to test that, what is a good site that I can test it against?
@kelvinhammond nodejs-libzim is used in https://github.com/openzim/mwoffliner and https://github.com/openzim/phet. You can see how they are called at https://farm.openzim.org/, our solution to build automatically ZIM files. But we would like to build other scrapers using nodejs-libzim, like one for https://libretexts.org/ (https://github.com/openzim/zim-requests/issues/131).
I will have a look to the memory leak detection script, but I guess I can not really help. I suspect @ISNIT0 the author never have commited it.
Just out of curiosity, how did you came to rewrite nodejs-libzim? Do you use it for an other project?
@kelvinhammond I have transmitted the review of your ticket to @midik Which is a typescript expert and far more skilled than me to do this review. With his help and your help, the goal is to get that PR merge and then make a new official release on npmjs.org.
I ported it because I wanted to explore the zim files a bit and possibly index a bunch of zim files with one index. I tried the nbind version and couldn't get it working with the current nodejs because nbind is outdated.
I ported it because I wanted to explore the zim files a bit and possibly index a bunch of zim files with one index.
Not sure exactly for what you need that or if you talk about the Xapian index, but our approach to do that is to use the ability of Xapian to deal with many indexes and gather the results. We call it the multizim support and we are at the really beginning of its implementation in the libkiwix.
@kelvinhammond Thank you for your contribution. I just made some minor changes in TS part. @kelson42 Regarding to memory consuming, I tested on a large ZIMs with 1 billion text articles on node 13.5, max memory consuming didn't exceed 1-1.2 Gb.
@midik Thank you for your review @mgautierfr I would like to ask you a review on the C++ part please @kelvinhammond Your PR seems to works fine. We will have a look to the C++ code and if OK, we would then go for a new release. Thank you for your patience.
@kelvinhammond We just have released a new version of libzim which is more performant http://download.openzim.org/release/libzim/libzim_linux-x86_64-6.1.0.tar.gz. Would you be able be to use it in place of the older one?
@kelson42 done.
@kelson42 CodeFactor is failing because its outdated, how do I get it to retry?
@kelvinhammond I believe I have forced codeFactor to run again ainst this PR. I see here https://www.codefactor.io/repository/github/openzim/node-libzim/pulls that last run is a few minutes ago. In general, each new commit should force a rescan... Why to you believe this is not the case?
I think, the best is to make things in two steps:
@kelvinhammond Does that make sense to you? Would you be able to look to the two specific comment left by @mgautierfr please? Regarding CodeFactor, I have no clue what is going one, I would then just ignore it.
@kelson42 yes, I'll update the PR addressing the comments. CodeFactor I'll try to fix, hopefully it fixes itself when I push, if that fails I'll just remove the CPPLint.cfg I added.
@kelvinhammond New release from libzim https://download.openzim.org/release/libzim/libzim_linux-x86_64-6.1.1.tar.gz (we have detected a regression in previous one. No new one is foreseen in the next weeks).
@kelson42 I've bumped the libzim version, I'll get the code review changes in the near future.
@kelvinhammond Any chance you would be able to polish the last small problems of your PR in the next days? I'm really eager to merge it and make a new release.
@kelson42 I'll try to finish it up today, sorry for the delay.
@kelson42 I've update this PR with the changes for this PR. fixing codefactor now.
@kelvinhammond Thank you very much, please re-ask for review when your work is over.
Ports node-libzim to N-API.
Almost completely rewritten.