iiab / calibre-web

:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database
GNU General Public License v3.0
4 stars 5 forks source link

Adjust error message for stuck/slow videos [if elapsed_time >= timeout, e.g. due to unavailable fragments &/or xklb media_check] #223

Closed deldesir closed 3 months ago

deldesir commented 4 months ago

This PR gets rid of the dozen multi-lines error messages

Tested on Ubuntu 24.04 (LRN2) image

holta commented 4 months ago

Lack of explanatory context is the far more serious problem:

(Either way: all users deserve a clear explanation of next steps!)

[*] Link to a concrete actionable steps at https://github.com/iiab/calibre-web/wiki if absolutely necessary.

Possibly related:

deldesir commented 4 months ago

Lack of explanatory context is the far more serious problem:

Is this ticket explanatory enough? https://github.com/yt-dlp/yt-dlp/issues/2137

  • Lack of link(s) to yt-dlp core issue(s) — if the problem is impossible to solve — link from "Tasks" view to a concrete explanation as to why.

I'll put the following link in the error message https://github.com/yt-dlp/yt-dlp/issues/2137

  • Conversely, an actionable suggestion needs to be included in "Tasks" view — or a link to an actionable suggestion[*] if users can take action.

What to suggest a user to do if nothing can be done. Redownloading/retrying is the only concrete action a user can do in such cases. There is no buttons or parameters to set in the GUI.

(Either way: all users deserve a clear explanation of next steps!)

[*] Link to a concrete actionable steps at https://github.com/iiab/calibre-web/wiki if absolutely necessary.

holta commented 4 months ago

Is this ticket explanatory enough? yt-dlp/yt-dlp#2137

Can an anchor tag ( e.g. URL#anchor ) link to the most relevant message / context within yt-dlp/yt-dlp#2137 ?

holta commented 4 months ago

Is this ticket explanatory enough? yt-dlp/yt-dlp#2137

Can an anchor tag ( e.g. URL#anchor ) link to the most relevant message / context within yt-dlp/yt-dlp#2137 ?

  1. @deldesir if the explanation at the very top of yt-dlp/yt-dlp#2137 is in fact the best explanation of the problem, then please confirm that an anchor tag is not useful in this particular case.
  2. If that's the case, can the title (subject line) of yt-dlp/yt-dlp#2137 ("--live-from-start didn't merge after live stream end") be weaved into the "Tasks" view explanation to make it more accurate?
    1. Or if you believe the title (subject line) of yt-dlp/yt-dlp#2137 is itself not quite accurate, please just explain what you think is really happening / what's the correct explanation, Thanks!
holta commented 4 months ago

Excerpt from https://github.com/iiab/calibre-web/issues/215#issuecomment-2229573817 :

Explain here why the error message is being repeated, so everyone can understand the root cause, and what it means.

Whereas suppressing error messages is almost always neither useful nor healthy.

@deldesir does repetition of this error message somehow indicate (or at least correlate?) with the severity of the problem with an individual video download?

(If so, should this [i.e. how much the error message is being repeated] somehow be communicated to the "Task" view user/operator?)

deldesir commented 4 months ago

the message was always repeated because it's caught within the loop of the polling mechanism looking for activity. This was triggered by inactivity for 30s since the video was stuck at x bytes.

https://github.com/iiab/calibre-web/blob/8b5ebff30ab49dfe3ce4a6ea9a97db611e60b462/cps/tasks/download.py#L82-L84

In this particular case, it was repeated with multiline because the error message was appended to previous self.message with self.message += "..."

holta commented 4 months ago

Sounds like you're saying the error message should never really have been repeated (using += appending) in the first place — if the repetition was completely meaningless? (In other words in PR #211 about 2 weeks ago.)

(No problem, but if so, forgive me for assuming that repetition had been added for a reason!)

deldesir commented 4 months ago

Sounds like you're saying the error message should never really have been repeated (using += appending) in the first place — if the repetition was completely meaningless? (In other words in PR #211 about 2 weeks ago.)

That's right

(No problem, but if so, forgive me for assuming that repetition had been added for a reason!)

No worry, sometimes I cannot explain stuff well enough. The self.message is constantly being replaced by the same message when there is no activity for 30s. the += made it multi-line. This was a mistake.

holta commented 4 months ago

No worries!!

Let's just make the new error message as accurate as possible, perhaps including bits of the yt-dlp/yt-dlp#2137 title / subject line if this explains more clearly + completely, per: https://github.com/iiab/calibre-web/pull/223#issuecomment-2231697215

holta commented 4 months ago

@deldesir please explain why this appears to be happening — hinting at your best guess {assumptions, intuition, intention} :

  1. Explain how yt-dlp flag --live-from-start(yt-dlp/yt-dlp#2137) can sometimes (often?!) be related?

And/Or...

  1. Summarize the situation — also linking to yt-dlp/yt-dlp#6078 — if that adds useful context re: unavailable fragment failures, e.g. for #215 ?

And/Or...

  1. Mention "fragments failing to merge" (and what that means?!) if that discussion on yt-dlp/yt-dlp#2137 adds critical understanding... ?

🙏

deldesir commented 4 months ago

@holta, this is difficult to put an accurate message for this bug. My observation is we never know if the stuck video will fail or recover.

For example, the following video was stuck, but ended up being donwloaded succesfully. I'd like to put a message to encourage the user to wait, but we don't know if it's worth it.

image

after 2-3 minutes:

image

UPDATE: This specific video was not stuck. It was reported as such because the post-processing took some time. I ran lb-wrapper dl https://www.youtube.com/watch?v=ZdzGp2eWM5o and could see the progress reached 100%.

image

holta commented 4 months ago

UPDATE: This specific video was not stuck. It was reported as such because the post-processing took some time.

Great insight! ✅

Hopefully "Tasks" view will clarify this & similar revelations as soon as you can in coming days — one tiny-or-small PR at a time — as soon as each is safe and reasonable to merge! 🪚

deldesir commented 4 months ago

When I increased the fragment stuck timeout to 120s during some testing, the video downloaded, but xklb's media check reported it as corrupt. The file was playable however. That means we're not dealing here with a classic "unavailable fragments" issue. It's a post-processing one.

I am still learning more and investigating...

holta commented 3 months ago

Extremely Valuable Research!

https://en.wikipedia.org/wiki/Fragging is never pretty 🙀

deldesir commented 3 months ago

Here is what I found out. The particular video https://www.youtube.com/watch?v=4BL65HElOPg from #215 was not stuck due to unavailable fragments. After every download, the video files are processed by the xklb's media_check. This process can take some time. Cases like this were wrongfully being identified as "unavailable fragments" issues in the Download task's message. Timeout is now increased to 120s in #229 as a workaround to avoid false positives. The stuck video detection can be enhanced by setting it to be fired only between 0% and 100% treshold.
image

holta commented 3 months ago

particular video https://www.youtube.com/watch?v=4BL65HElOPg from #215 was not stuck due to unavailable fragments

Progress, thanks for confirming! @deldesir please also clarify:

deldesir commented 3 months ago

I suggest we go with a better explanation while I am working on the cause of the video being stuck. I'll keep the yt-dlp issues' references and make the PR reviewable as the timeout mechanism was designed specifically for missing fragments issues.

holta commented 3 months ago

I suggest we go with a better explanation while I am working on the cause of the video being stuck.

Great. Plz revise the explanation to contextualize as best you can. We can always revise later when understanding advances + shifts in due course!

I'll keep the yt-dlp issues' references and make the PR reviewable as the timeout mechanism was designed specifically for missing fragments issues.

Apologies I don't fully grasp "all" the nuances yet.

(Hopefully we can shine a bright candle for almost everyone very soon! In "Tasks" view especially ;)

holta commented 3 months ago

I suggest we go with a better explanation while I am working on the cause of the video being stuck.

timeout mechanism was designed specifically for missing fragments issues.

@deldesir are you suggesting that error message "stuck due to unavailable fragments" is sometimes incorrect?

Thank you for clearing this up, so this PR (or similar) can be merged without delay. (As mentioned a week ago, this can always be revised later if understanding advances + shifts in due course!)

deldesir commented 3 months ago

@deldesir are you suggesting that error message "stuck due to unavailable fragments" is sometimes incorrect?

Yes, tasks may appear stuck due to delays in processing video files by ffmpeg too.

A message like "Processing video xxx is taking longer than usual, please wait..." will do. If for some reason the download task fails, we can let the user know about it with another message like: "Video xxx failed to download due to xxx"

holta commented 3 months ago

tasks may appear stuck due to delays in processing video files by ffmpeg too.

If there are delays or failures implementing FFmpeg in isolated circumstance — e.g. if "instantly" generating missing thumbnails is necessary for some video downloads — we'd surface those delays or failures (FFmpeg or otherwise!) making them increasingly very visible in Tasks view. Related:

A message like "Processing video xxx is taking longer than usual, please wait..." will do.

Please include a link that (also) explains real world causes of delay:

holta commented 3 months ago

If there are delays or failures implementing FFmpeg in isolated circumstance — e.g. if "instantly" generating missing thumbnails is necessary for some video downloads — we need to surface those delays or failures (FFmpeg or otherwise!) making them increasingly very visible in Tasks view.

Let's outline in a separate GitHub issue or PR the most practical way(s) to surface + strengthen clarity (concrete explanation, hints, tips, or context!) in Tasks view.

Addressing similar/common errors, cleanly + efficiently in "Tasks" view, over coming weeks. 💫

Related:

holta commented 3 months ago

@deldesir linking to this actual ticket (https://github.com/iiab/calibre-web/pull/223) is also fine in the interim, if you prefer?

holta commented 3 months ago

A message like "Processing video xxx is taking longer than usual, please wait..." will do.

Given the reality that this error is triggered if elapsed_time >= fragment_stuck_timeout :

holta commented 3 months ago

"Processing video xxx is taking longer than usual, please wait..."

"Video xxx failed to download due to xxx"

Recap:

  1. Let's be Precise: "processing" is not the same as "downloading". 64c0b55c937582462215c11cf54807692fa47e34 (July 19) mentions "Make message less categoric / We don't know whether the download will recover or not" — suggesting that "processing" might not be accurate❓

  2. Definitely mention likely reasons for delay (e.g. fragment download &/or FFmpeg thumbnail creation &/or xklb's media_check) in the actual error message, nurturing continuous progress 🔎

  3. Every error message should include a link to a relevant GitHub ticket comment (or https://github.com/iiab/calibre-web/wiki section, or whatever link is most meaningful!) to cultivate participACTION 🪜

holta commented 3 months ago

ASIDE: Should we preserve the original += to repeat the error message in "Tasks" view every 2 minutes (120 seconds instead of 30 seconds as a result of PRs #227 and #229) since there are now 4X fewer error messages?

deldesir commented 3 months ago

ASIDE: Should we preserve the original += to repeat the error message in "Tasks" view every 2 minutes (120 seconds instead of 30 seconds as a result of PRs #227 and #229) since there are now 4X fewer error messages?

  • The argument in favor of += would be: Users expect and deserve to see regular progress indicators / explanation for what is happening! Reiterating the error message every 2 min might be appreciated by some, as a stronger visual indicator, if it doesn't create too much clutter? 🔢
  • The argument against += would be: Users can already watch the time counter increment every few seconds (or freeze up!) during delays — e.g. some might feel that reiterating the same error message every 2 min doesn't help? ⏳

I think it's a bad idea. Multiline messages convey progress, but spamming the message column with repetitive messages is very ugly.

holta commented 3 months ago

Description looks great!

Both HTML links appear slightly off to me??

@deldesir can you abbreviate the visible part of links... to be?

avni commented 3 months ago

Tested locally with Blondel via screen share on an IIAB instance. This PR is Ready to merge, imo.

[NB: http://lrn2.org's IP is blocked from downloading videos so we could not test there! See: yt-dlp/yt-dlp/issues/10128]

Screenshot 2024-08-16 at 8 11 22 PM