openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
42 stars 3 forks source link

English Wikivoyage run overdue (time sensitive) #1129

Open Jaifroid opened 3 months ago

Jaifroid commented 3 months ago

The periodicity of the ZIM Farm recipe https://farm.openzim.org/recipes/wikivoyage_en_all is set to monthly, but the last run was 16th June 2024.

Ideally, I need a new version for the Wikivoyage by Kiwix app update before the code-signing certificate expires in a couple of weeks... (to buy time in adapting to the new code-signing reigime). Thanks.

kelson42 commented 3 months ago

I will relaunch it with version dev (1.14.0) of MWoffliner, but we will have a ZIM which will be arounf 30% bigger, this is currently a regression we try to fix.

Jaifroid commented 3 months ago

@kelson42 Thank you. Let's see...

Jaifroid commented 3 months ago

The resulting ZIM size is fine at 875MB. However it breaks our Restricted mode (aka JQuery mode) because it has all the sections set with attribute style="display: none;" (see screenshot at bottom). Since we don't run JS in Restricted mode, it means these ZIMs become inaccessible to users who do not have JS enabled in their browser or (in the case of Kiwix JS) cannot run a Service Worker.

This issue has been a longstanding one that was partially fixed: see https://github.com/openzim/mwoffliner/issues/962, https://github.com/openzim/mwoffliner/issues/838, https://github.com/openzim/mwoffliner/issues/952 and (related) https://github.com/openzim/mwoffliner/issues/1033. Another closely related: https://github.com/openzim/mwoffliner/issues/1915.

This regression (but it's worse, see next paragraph) has crept back in probably with the new endpoint. The logic should be: the HTML has all sections open by default, and they should only be closed by a script that will run when JS is enabled. This makes the most accessible ZIM type for mwOffliner.

The regression is much worse now because each whole section containing a details-summary block is now hidden, instead of the block itself being closed. It should never use inline display: none; rule, and should rely only on the open attribute only for the details-summary section.

@kelson42, @audiodude Where should I report this regression? In mwOffliner Repo or wp1?

image

I can work around this in the reader for the upcoming Wikivoyage release with a temporary patch to force-unhide, but it should be fixed in the scraper for a more universal solution.

kelson42 commented 3 months ago

@Jaifroid We won't provide collapsablible sections anymore as this was a feature of older "mobile-section" API. This seems therefore some kind of leftover which shoukd be removed IMHO. Please open tocket in MWoffliner.

Jaifroid commented 3 months ago

@kelson42 OK. I suppose https://github.com/openzim/mwoffliner/issues/1915 should be closed as won't fix in that case. I mean, it's not hard to substitute the sections and headers for detail-summary tags in case there's any desire to keep that feature, but to be honest I never found it useful to have all the sections closed, even in the mobile version.

Jaifroid commented 1 month ago

@kelson42 The English Wikivoyage recipe has been running on mwoffliner:dev monthly since you kindly re-eanabled it in August. But in the last two months (Sep and Oct) it has only succeeded in producing the nopic version. It keeps failing to produce maxi, which is the one I need to update the Wikivoyage app. See log below.

Is there anything we can do about this? I'm keen to get a new Wikivoyage out especially because the last version had some issues in KJS / PWA that I hadn't fully caught and now they are fixed, so it would be a buch better UX than the currently published app version.

I looked at the logs on https://farm.openzim.org/pipeline/53d49603-fd60-4933-b7f3-5b8130b29805/debug, and the failure seems to be reported as:

An error was encountered in a non-retryable streaming request.
[error] [2024-10-04T23:43:02.196Z] Cache error while uploading object: RequestTimeTooSkewed: The difference between the request time and the current time is too large.
    at throwDefaultError (/tmp/mwoffliner/node_modules/@smithy/smithy-client/dist-cjs/index.js:840:20)
    at /tmp/mwoffliner/node_modules/@smithy/smithy-client/dist-cjs/index.js:849:5
    at de_CommandError (/tmp/mwoffliner/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:4743:14)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /tmp/mwoffliner/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20
    at async /tmp/mwoffliner/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:482:18
    at async /tmp/mwoffliner/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
    at async /tmp/mwoffliner/node_modules/@aws-sdk/middleware-flexible-checksums/dist-cjs/index.js:174:18
    at async /tmp/mwoffliner/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:110:22
    at async /tmp/mwoffliner/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:138:14 {
  '$fault': 'client',
  '$metadata': {
    httpStatusCode: 403,
    requestId: 'F46C3F407538CF32:B',
    extendedRequestId: 'oG4WkdG8LALy860KgaUTw18qofngQZNZg39fIB5nw/M3ZOxSudf2AcKLEFPlbO0bPEnQzTB7eU2k',
    cfId: undefined
  },
  Code: 'RequestTimeTooSkewed',
  RequestTime: '20241004T232801Z',
  ServerTime: '2024-10-04T23:43:02Z',
  MaxAllowedSkewMilliseconds: '900000',
  RequestId: 'F46C3F407538CF32:B',
  HostId: 'oG4WkdG8LALy860KgaUTw18qofngQZNZg39fIB5nw/M3ZOxSudf2AcKLEFPlbO0bPEnQzTB7eU2k',
  CMReferenceId: 'MTcyODA4NTM4MTkzOSAzOC4xNDYuNDAuMTA3IENvbklEOjMyNDAyNDg0Ny9FbmdpbmVDb25JRDozMDI5NDI2L0NvcmU6OTM='
}
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^
audiodude commented 1 month ago

This doesn't look like a problem with mwoffliner itself, but rather the container environment. Specifically:

  RequestTime: '20241004T232801Z',
  ServerTime: '2024-10-04T23:43:02Z',

Looks bad.

I'll try running it locally and see if I can send you a ZIM.

audiodude commented 1 month ago

It ran fine for me on 1.14-dev. Here's the ZIM: https://www.dropbox.com/scl/fi/idcb6c96gzigmc5mcm45x/wikivoyage_en_all_maxi_2024-10.zim?rlkey=e5q52jymxxa5fu316edl3bwit&st=l2wjaik0&dl=0

This is clearly some kind of operational/environment issue with zimfarm.

benoit74 commented 1 month ago

This doesn't look like a problem with mwoffliner itself, but rather the container environment. Specifically:

At least mwoffliner1 (which is the VM which ran last task) has proper time clock, up-to-date.

Could it be that the container itself has improper ntc clock and get slowly skewed as it progress?I don't think so, it seems weird a container can "loose" ~20 mins during a 3 hours job.

What I see in the logs are many message like @smithy/node-http-handler:WARN - socket usage at capacity=50 and 115 additional requests are enqueued..

To me this is the root cause of the problem, and it is a software bug. I understand this message as the software has started many HTTP requests and they are getting delayed. And hence the skew time between request time and server time from my PoV. Classic async issue in Node.JS from my PoV, where one can easily get bitten by all this asynchronous logic which has many advantages but some drawbacks.

And of course this problem occur randomly since it depends on your connectivity with the upstream. Travis connectivity to AWS is probably better than the connectivity from mwoffliner1 / wikimedia foundation datacenter.

I'll open another issue to track this problem.

benoit74 commented 1 month ago

Upstream issue: https://github.com/openzim/mwoffliner/issues/2092

Jaifroid commented 1 month ago

Thanks, guys, for diagnosing this, and @audiodude for the updated ZIM file. I've downloaded and done some quick tests on this, and it looks fine.

Would anyone have any objections to my uploading it to download.kiwix.org (Wikivoyage folder) so I can build the Wikivoyage app (via GH Actions)?

benoit74 commented 1 month ago

No objection on my side.

Jaifroid commented 1 month ago

OK, it seems my credentials only give me access to the nightly and release directories (for software), but not to the zim directory, so I can't in fact upload the new Wikivoyage ZIM to download.kiwix.org. @benoit74 Is this something you could kindly do for me? The ZIM to upload to /zim/wikivoyage is in @audiodude's dropbox linked in https://github.com/openzim/zim-requests/issues/1129#issuecomment-2398458748 . Many thanks! 😊

benoit74 commented 1 month ago

File is available in production at https://download.kiwix.org/zim/wikivoyage/wikivoyage_en_all_maxi_2024-10.zim

It will soon be automatically indexed in the catalog.

Jaifroid commented 1 month ago

@benoit74 Thank you very much indeed!