openzim / node-libzim

Libzim binding for Node.js: read/write ZIM files in Javascript
https://www.npmjs.com/package/@openzim/libzim
GNU General Public License v3.0
27 stars 11 forks source link

Move to libzim7 #72

Closed kelvinhammond closed 2 years ago

kelvinhammond commented 2 years ago

Fixes #69 Fixes #79

This is a work in progress. The test script is what I'm using to test things as I develop, I'll add proper tests later. Once I figure out the threading issues I'll remove / cleanup the code.

Let me know if you want the API changed. I went with accessor methods for some of the C++ class methods because that is the current javascript style. This code could be dried out with all the try catch stuff but then it ends up looking a bit unwieldy so its not dry for readability and flexibility in this way.

kelson42 commented 2 years ago

@kelvinhammond Thank you for your PR. Should I try to build it and see how we could use it with mwoffliner?

kelvinhammond commented 2 years ago

@kelvinhammond Thank you for your PR. Should I try to build it and see how we could use it with mwoffliner?

@kelson42 Its not quite ready yet, the creator doesn't work because there are some issues with accessing the javascript objects via a thread. I'm working on figuring that out. Single threaded might work but its missing some functions such as add illustration etc.

kelson42 commented 2 years ago

@kelvinhammond FYI We have also made a new maintenance release of libzim, the version 7.1.0 https://download.openzim.org/release/libzim/. Nothing really important for you, but please use it if possible. Here is the changelog https://github.com/openzim/libzim/blob/master/ChangeLog

kelson42 commented 2 years ago

@kelvinhammond Any news on this PR? Is there anything which can be done to help you to make it ready for review?

kelvinhammond commented 2 years ago

@kelson42 Sorry about the delay, I've been busy with life. I'll get started on this gain this weekend and hopefully finish this up.

kelvinhammond commented 2 years ago

Rebase from master and force pushed with lease to this branch.

kelvinhammond commented 2 years ago

@kelson42 Can someone review this? I've basically rewritten the library again so that it supports the new libzim7 architecture. I'll updated mwoffliner after this has been merged and released.

codecov[bot] commented 2 years ago

Codecov Report

Merging #72 (6c945ae) into master (b44a29f) will not change coverage. The diff coverage is n/a.

:exclamation: Current head 6c945ae differs from pull request most recent head b4db7ca. Consider uploading reports for the commit b4db7ca to get more accurate results

@@            Coverage Diff            @@
##            master       #72   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            1         1           
  Lines            1         1           
=========================================
  Hits             1         1           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

kelvinhammond commented 2 years ago

Not sure how to fix the CI builds. Some of the builds use node 12 which is end of life this month. The ones that use node 14x+ pass. Can someone update those?

Codefactor is currently outdated and does not currently align with the CPPLint.cfg in the project for some reason.

kelson42 commented 2 years ago

@kelvinhammond If Nodejs12 makes difficulties we can just remove it, see .github/workflows/ci.yml I will, you will just need to rebase.

kelson42 commented 2 years ago

@ke

@kelvinhammond If Nodejs12 makes difficulties we can just remove it, see .github/workflows/ci.yml I will, you will just need to rebase.

Done, and all the CI pass, so if something does not anymore with this PR. This is probably a regression.

BTW, we have meanwhile a libzim7.2.0 released at https://download.openzim.org/release/libzim/ and you can also anytime test with the cutting edge nightly at https://download.openzim.org/nightly/ (a few improvements have been done after your reported a problem and might be useful for you). If necessary, we could consider a patch release "just for you" getting the latest codebase for node-libzim 2.5.0

kelvinhammond commented 2 years ago

I'll make that push / commit soon then

kelvinhammond commented 2 years ago

Its ready

kelvinhammond commented 2 years ago

@kelson42 It says node 14x but then I see this gyp info using node@10.24.1 so I'm not sure what's wrong with the build

kelson42 commented 2 years ago

First I have rebased on master (necessary anyway... and master passes). I have created a new PR, which is basically done with master, see https://github.com/openzim/node-libzim/pull/78. So, there is something in this branch which makes the CI not passing.

Maybe this has nothing do to with this nodejs10 stuf... look at https://github.com/openzim/node-libzim/runs/6055607733?check_suite_focus=true... and I see the same kind of log for gyp. Maybe it would be worth a try to upgrade to node-gyp 9.0.0?

kelson42 commented 2 years ago

@kelvinhammond I believe I have achieved to fix the CI :) So we have only the codefactor problems left.

kelvinhammond commented 2 years ago

Thank you. I tried it on a Mac and it worked, tests passed.

kelvinhammond commented 2 years ago

@kelson42 Codefactor is out of date and has not updated for some reason. The includes it complains about are all there.

I also updated the CPPLINT.cfg to allow for #pragma once which should be acceptable as a header guard.

Running cpplint src/*.{h,cc} gives

Done processing src/archive.h
Done processing src/blob.h
Done processing src/common.h
Done processing src/contentProvider.h
Done processing src/creator.h
Done processing src/entry.h
Done processing src/entryrange.h
Done processing src/item.h
Done processing src/module.cc
Done processing src/search.h
Done processing src/suggestion.h
Done processing src/writerItem.h

with no errors.

kelson42 commented 2 years ago

@kelson42 Codefactor is out of date and has not updated for some reason.

How do you see that? What is the current version, and what is the latest version? It's annoying if CodeFactor delivers too many false positive errors.

kelvinhammond commented 2 years ago

@kelson42 If you click on details next to the CodeFactor it shows this https://www.codefactor.io/repository/github/openzim/node-libzim/pull/72

If you look at blob.h it includes <string> but CodeFactor complains about Add #include <string> for string lines of code = 1

It also is supposed to pull from the CPPLint.cfg which I've updated.

kelson42 commented 2 years ago

@kelvinhammond I believe your latest commit is not proper…

kelvinhammond commented 2 years ago

@kelson42 I think I've fixed the previous commit

kelson42 commented 2 years ago

@kelvinhammond Hmm, we still have a few things to fix looks like:

kelvinhammond commented 2 years ago

Working on figuring out a threading issue where libzim addItem adds an item to the queue and blocks until it can but blocks the main thread as a result

kelson42 commented 2 years ago

@kelvinhammond Hope you work with the nightly of libzim. A new libzim release should be done today or tomorrow.

kelson42 commented 2 years ago

@kelvinhammond libzim 7.2.1 is now available at http://download.openzim.org/release/libzim/

kelvinhammond commented 2 years ago

@kelson42 version updated.

kelson42 commented 2 years ago

@kelvinhammond Thx, problems with macOS in the CI are normal, we are handling them.

kelvinhammond commented 2 years ago

@kelson42 I updated the test-mem-leak script but now I'm seeing a new issue with creating a large zim, I'll get back to you once I figure it out and push a fix.

kelvinhammond commented 2 years ago

Issue I'm working on here: https://github.com/nodejs/node-addon-api/issues/1174

kelson42 commented 2 years ago

CI Is back green for macOS

kelvinhammond commented 2 years ago

@kelson42 I'll debug this a bit more tomorrow but currently we're waiting on the nodejs node-addon-api team to see if they can fix the Segmentation fault issue(s).

Otherwise this is ready to go, I'll probably start on the mwoffliner changes while waiting on a fix for that, the api for this repo should not change that much.

kelson42 commented 2 years ago

@kelvinhammond Sounds good! We have released a new libzim 7.2.2, you can just use it, it should not change anything for you.

kelson42 commented 2 years ago

I also updated the CPPLINT.cfg to allow for #pragma once which should be acceptable as a header guard.

It seems to be a weakness in cpplint! https://github.com/cpplint/cpplint/issues/65

kelvinhammond commented 2 years ago

Still waiting on the node-addon-api team for a fix, not sure when they'll have something. Any ideas on what to do next?

kelson42 commented 2 years ago

@kelvinhammond What is exactly the impact of the nodejs "bug" in mwoffliner? Is that aporadic, systematic?

kelvinhammond commented 2 years ago

@kelson42 The issue happens on pretty much every run, it is reproducible.

kelson42 commented 2 years ago

@kelvinhammond Would that be wise/workable to start a PR on mwoffliner to adapt the code to new libzim7?

kelvinhammond commented 2 years ago

I've already started looking at mwoffliner

kelson42 commented 2 years ago

@kelvinhammond What about merging this PR and opening a dedicated ticket for the crash scenario? This would make it easier to move on I believe.

kelvinhammond commented 2 years ago

The test-mem-leak test does not pass because of this and also will not work with larger zim files because of this bug.

kelvinhammond commented 2 years ago

@kelson42 @mgautierfr Does this seem normal for speed on an ssd laptop?

|----------| 6438/1000000 0% [elapsed: 02:03, left: 34327777:26:04, 52.10 iters/s] Default settings, clustersize 2048, indexing on with "en";

kelson42 commented 2 years ago

@kelvinhammond I'm not sure to understand the question. Where this log comes from. What has been done to get it?

kelvinhammond commented 2 years ago

@kelvinhammond I'm not sure to understand the question. Where this log comes from. What has been done to get it?

I was testing libzim calls and it was doing about 52 iterations per second which seemed kinda slow. I wanted to know if this was normal or if there was a bottleneck somewhere.

kelson42 commented 2 years ago

@kelvinhammond Unfortunately I have no opinion on this

mgautierfr commented 2 years ago

Iterations on what ? Reading entries ? You read only 52 entries per second ? Or writing ?

In any case, it seems a bit slow yes.

kelvinhammond commented 2 years ago

I seem to be stuck with the nodejs bug at the moment, not sure what I should do from here.

https://github.com/nodejs/node-addon-api/issues/1174

kelson42 commented 2 years ago

@kelvinhammond Yes, not easy. I would recommend to push on the mwoffliner front in the meantime, so we will be ready there too when this bug will be fixed.

kelson42 commented 2 years ago

@kelvinhammond libzim8.0.0 has been published.

kelvinhammond commented 2 years ago

@kelson42

I think this PR is ready to go, what do we need to do next?