Closed vanaigr closed 1 year ago
awaiting on
destroy()
or not usingworkerPort
Either of those seem like the "correct" solution here, depending on your exact situation. Please note that the method is documented as being asynchronous, hence having to await its completion may perhaps be somewhat expected? https://github.com/mozilla/pdf.js/blob/0725b6299f122632a7d5b97f82fe584ae1584c5d/src/display/api.js#L622-L636
Please also note that for performance reasons it's not really recommended to parse more than one document at a time in each worker-thread, since otherwise multiple PDF documents would unnecessarily "compete" for parsing resources which will slow things down.
Having (quickly) read through https://github.com/mozilla/pdf.js/issues/16777#issue-1829956884 it's unfortunately not immediately clear (at least to me) if there's a simple solution to fix the current situation. To ensure maintainability of the relevant code, we want to avoid introducing too much complexity in this part of the API.
Please note that GlobalWorkerOptions.workerPort
is quite old functionality, and this is the first time that this particular issue has been reported[1]. While we could probably let getDocument
throw if the GlobalWorkerOptions.workerPort
is being used by another (non-destroyed) document, my worry is that we'd end up breaking things for more users in that way.
[1] Possibly because most users don't utilize GlobalWorkerOptions.workerPort
, or at least not in parallel like this.
Possibly we could delay destruction of the global workerPort
until all its associated loadingTasks are actually done, e.g. something along these lines, however:
I accidentally pressed the wrong button and closed the issue, sorry.
After debugging the code, I presumed that there probably wouldn't be any easy fix for this, so I tried to spot any place for a warning, and MessageHandler
destroying with callbackCapabilities
from the new task looked like a good candidate, because, under normal circumstances, I presume they should be empty. But I am not very familiar with the code, so if it's not the case or adding a warning here wouldn't worth it, then we can close this issue.
Also, in my defense of not awaiting for destroy()
, when I first saw this function called I looked into the documentation to see if it even existed, noticed that it returned Promise
, and, yes, assumed that I should await its completion. But quickly looking on the internet for small examples of its usage, I didn't find anyone awaiting and assumed that I don't need to.
After debugging the code, I presumed that there probably wouldn't be any easy fix for this,
Well, I suppose that it depends on your definition of "easy fix"; please note that https://github.com/mozilla/pdf.js/issues/16777#issuecomment-1664223730 contains a link to the smallest patch that I could imagine here: https://github.com/mozilla/pdf.js/compare/master...Snuffleupagus:pdf.js:issue-16777
so I tried to spot any place for a warning, and
MessageHandler
destroying withcallbackCapabilities
from the new task looked like a good candidate,
First of all, based on experience it's very difficult to write warning messages that are clear and concise enough that users will immediately understand what's wrong and more importantly how to fix things.
Secondly, I don't believe that the code (or even the file) that you're referencing would be an appropriate spot for a warning message unfortunately given the "special" situation described in this issue. (Also, note there's streamSinks
to consider as well.)
@timvandermeij How do you feel about the patch I linked in https://github.com/mozilla/pdf.js/issues/16777#issuecomment-1664318010, since as mentioned it does add some (possibly unwanted) complexity?
I think the patch looks reasonable. However, destroy
is marked as asynchronous, so completion guarantees can only be given if it's awaited, and if it's awaited I don't think there is actually an issue here? I would say awaiting the promise is the expected usage here, so I can't 100% oversee if there are no edge cases if one doesn't await, which makes me want to prefer simply awaiting (and possibly e.g. extending relevant documentation if possible).
But quickly looking on the internet for small examples of its usage, I didn't find anyone awaiting and assumed that I don't need to.
I think those examples work "accidentally" then, and this problem most likely won't be visible if no second document is loaded at the same time, which most users likely won't do.
"Solution": awaiting on
destroy()
or not usingworkerPort
PDF file: The issue happens with any PDF file, one PDF is embedded into the example
Configuration:
Steps to reproduce the problem:
Here is an HTML page that, when opened, reproduces the issue:
Console output would almost always be this:
Meaning that destruction of
task1
interferes with the creation oftask2
, which causes its promise to never resolve.Debugging the code, it looks like this is essentially what's happening:
destroy()
ontask1
is called,WorkerTransport
of that task (withMessageHandler
'd0') sends termination message to the correspondingWebWorker
'sMessageHandler
'd0_worker' and awaits for 'd0' to receive response to continue destruction (api.js:2516)getDocument()
is called, new task reuses thePDFWorker
of the previous task (api.js:2283, api.js:363)WebWorker
's 'worker' and returns promise awaiting on previous task's 'main'MessageHandler
(the one insidePDFWorker
) to receive the name of the newWebWorker
'sMessageHandler
(would be 'd1_worker') (api.js:410, api.js:456)WebWorker
's 'd0_worker' receives termination message from step 1, terminates, and sends back the response (worker.js:791). ThenWebWorker
's 'worker' receives message from step 3, createsMessageHandler
'd1_worker', and sends its name back (worker.js:90)PDFWorker
is destroyed, which in turn removes its 'main' callback from the WebWorker. The destruction completes (api.js:633, api.js:2270)WebWorker
MessageHandler
) is never received because all of the callbacks were removed from WebWorker, thus promise from step 3 is never resolvedIn conclusion: PDFWorker callbacks responsible for continuing execution of new task's promise are removed before they could do this.
I also want to note that the 'main'
MessageHandler
destroys with nonemptycallbackCapabilities
, which probably doesn't happen normally, so maybe this would be a place where the check with warning of such behavior could be added (shared/message_handler.js:531)Adding my own event listener to the WebWorker
results in the following:
Sidenote: the only thing I could not figure out is where the message with the new
WebWorker
MessageHandler
's name goes by default. If I add the listener after the task1 finished destroying, it is not called, meaning that the message is effectively lost. It looks like the WebWorker messages are not queued, and if no listeners are present, the messages are actually lost, because this:produces just
received
and notlost
received
Going back to the main issue, it can occur only if these conditions are met:
PDFDocumentLoadingTask
s exist at the same timePDFDocumentLoadingTask
s use the same WebWorkerI tried to look in the examples, docs, and api file but couldn't find any information supporting or disallowing having multiple tasks at the same time.
Code meeting only one of these conditions, as far as I tested, works fine: Using the same WebWorker for multiple tasks (with
workerPort
) seems to work fine with only one working task at a time. Creating multiple tasks with different workers also works fine and does not give any warnings or errors:Also, looking on the internet (mostly stackoverflow) it looks like nobody is awaiting on
destroy()
which could lead to condition 1.The only hint that 1 is not allowed I found is that the more advanced applications (app.js and mobile-viewer) always wait on task destruction before creating the next one.
I hope I wrote the line numbers correctly and didn't miss some text in some obvious place that says that 1 and/or 2 are not allowed to happen.