iiab / calibre-web

:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database
GNU General Public License v3.0
3 stars 4 forks source link

Formerly live videos fail to download: "failed to download [download] ... does not pass filter (live_status=?not_live); skipping" #188

Open holta opened 1 month ago

holta commented 1 month ago

@nzola experienced many such errors.

Videos that were once live appear to be erroneously blocked by IIAB Calibre-Web.

Example of a video that should be downloading, but fails to download:

IMG-20240615-WA0003~2

@nzola mentioned:

Downloaded these playlists [and] thumbnails without any problems: https://www.youtube.com/channel/UCX9j__vYOJu00iqBrCzecVw https://www.youtube.com/playlist?list=PL1mP_vkqPB7EsIqqfwcGsg2rQNzoVy0mk

But cannot download the following single videos: https://www.youtube.com/watch?v=BK0XGf20l84 https://www.youtube.com/watch?v=VCM8tg_mGSw https://www.youtube.com/watch?v=w8snrdaoTUs&t=2s https://www.youtube.com/watch?v=5BO9nhtF0Cc https://www.youtube.com/watch?v=rbEsoe8F-l4&t=7788s https://www.youtube.com/watch?v=Drec4XAMJzI&t=6737s https://www.youtube.com/watch?v=w8snrdaoTUs&t=7s

VM's iiab-diagnostics: https://dpaste.com/6H8F53GPQ

@deldesir: Any idea what's happening?

deldesir commented 1 month ago
2024-06-15 14:11:34 - [Debug] [https://www.youtube.com/watch?v=5BO9nhtF0Cc]: Unrecoverable error matched. [download] Analyse du 15 juin 2024: Controverse autour de la nomination du nouveau directeur de cabinet de FM-bM-^@M-& does not pass filter (live_status=?not_live);skipping ..

The error suggests the video is still a live video. On YouTube, the video is advertised as a finished live stream.

I bet the live_status column in media table (xklb-metadata) says it's still a live. For the moment, what I understand is the video need to pass the match filter per:

https://github.com/chapmanjacobd/library/blob/ffe4fc8cc97697356f3df01bd0e720d1e4176c91/xklb/createdb/tube_backend.py#L418

holta commented 1 month ago

Progress!

@nzola please also take note if recently "live" YouTube videos download OR do not download β€” in case these's a clear pattern as to which succeed and which fail? πŸ™

nzola commented 1 month ago

Progress!

@nzola please also take note if recently "live" YouTube videos download OR do not download β€” in case these's a clear pattern as to which succeed and which fail? πŸ™

@holta @deldesir I understand that these videos were live when I tried to download them??? I will try downloading them again tomorrow, Sunday and give the report.

nzola commented 1 month ago

@deldesir I tried to download this playlist now: https://www.youtube.com/@TOPCONGOFM/playlists It failed. image

Just remembered. I downloaded the same playlist yesterday, it failed image

PUBLISHING TO URL... https://dpaste.com/H4SYGYT8U

deldesir commented 1 month ago

@nzola This is a playlist of playlists. I am trying to download it right now on my side (LRN2). Although only a subset of 100 videos will be attempted, the metadata fetch is going to take very long to gather metadata for all videos in every playlists found. I will report back on errors found. This may help me understanding the issues you have encountered.

holta commented 1 month ago

ASIDE: Of course YouTube videos that are truly "live" (happening at that moment) are not possible to download β€” until after the live event is complete. πŸŽ€πŸŽ™οΈ

deldesir commented 1 month ago

Downloading https://www.youtube.com/@TOPCONGOFM/playlists failed due to an unavailable video ( https://www.youtube.com/watch?app=desktop&v=n_iRE9al044). This needs a closer look. Thanks @Nzola for having reported on this.

holta commented 1 month ago

NON-URGENT:

"Tasks" view needs to use more clear language than bureaucratic language like "unavailable video" eventually.

So regular teachers in all countries know what this really means πŸ˜‡

Ideally with actionable suggestions (in those cases where that's realistic!)

nzola commented 1 month ago

https://www.youtube.com/watch?v=Drec4XAMJzI&t=6737s

I tried again to download these 7 videos now , but still failed. I did not do any upgrade or new install. PUBLISHING TO URL... https://dpaste.com/F6S6YX798

image

holta commented 1 month ago

I tried again to download these 7 videos now , but still failed. I did not do any upgrade or new install.

Curiously the screenshot shows no error message / explanation at all, during this 2nd attempt.

(Definitely room for improvement, thanks @nzola.)

deldesir commented 3 weeks ago

The error message will be displayed now, but the videos will not download. Live videos (finished or not) are not downloadable with xklb by design.

holta commented 3 weeks ago

Live videos (finished or not) are not downloadable with xklb by design.

By whose design?

FYI this design assumption seems extremely weak, given how videos labeled as "live" are actually used:

holta commented 3 weeks ago

The error message will be displayed now

@deldesir clarify which PR and code improved error reporting here?

Thanks, please if possible!

deldesir commented 3 weeks ago

The error message will be displayed now

@deldesir clarify which PR and code improved error reporting here?

Thanks, please if possible!

Per adjustments made in PR https://github.com/iiab/calibre-web/pull/194

holta commented 3 weeks ago

@deldesir please try to find a legit way to tell if a YouTube video is actually live or not.

(Instead of the bogus information that we're currently using β€” that's as good as useless β€” given the fact that so many podcasters permanently leave all their episodes marked as "live" ...rather intentionally... as permanently labeling videos as "live" serves as de facto marketing it would appear!)

holta commented 3 weeks ago

@deldesir please try to find a legit way to tell if a YouTube video is actually live or not.

If something like this can be upstreamed to become a part of xklb, even better!

✊

holta commented 3 weeks ago

@deldesir please test & use these URLs to make sure forward progress is steady in coming days β€” thank you to everyone working on this very common and very serious problem:

[ ADDITIONAL TEST CASES BELOW E.G. FOR "NOT YET LIVE" YOUTUBE URL'S! ]

nzola commented 3 weeks ago

The error message will be displayed now, but the videos will not download. Live videos (finished or not) are not downloadable with xklb by design.

Ok. I understands. Thank you.

holta commented 3 weeks ago

@nzola @avni

Until we solve this serious problem properly...

@deldesir believes that an initial hack/workaround should confirm the path forward, using yt-dlp options like...

Screenshot_20240622-162856

https://github.com/yt-dlp/yt-dlp/blob/master/README.md

nzola commented 3 weeks ago

@nzola @avni

Until we solve this serious problem properly...

@deldesir believes that an initial hack/workaround should confirm the path forward, using yt-dlp options like...

Screenshot_20240622-162856

https://github.com/yt-dlp/yt-dlp/blob/master/README.md

Ok @holta

nzola commented 3 weeks ago

@nzola experienced many such errors.

Videos that were once live appear to be erroneously blocked by IIAB Calibre-Web.

Example of a video that should be downloading, but fails to download:

IMG-20240615-WA0003~2

@nzola mentioned:

Downloaded these playlists [and] thumbnails without any problems: https://www.youtube.com/channel/UCX9j__vYOJu00iqBrCzecVw https://www.youtube.com/playlist?list=PL1mP_vkqPB7EsIqqfwcGsg2rQNzoVy0mk

But cannot download the following single videos: https://www.youtube.com/watch?v=BK0XGf20l84 https://www.youtube.com/watch?v=VCM8tg_mGSw https://www.youtube.com/watch?v=w8snrdaoTUs&t=2s https://www.youtube.com/watch?v=5BO9nhtF0Cc https://www.youtube.com/watch?v=rbEsoe8F-l4&t=7788s https://www.youtube.com/watch?v=Drec4XAMJzI&t=6737s https://www.youtube.com/watch?v=w8snrdaoTUs&t=7s

VM's iiab-diagnostics: https://dpaste.com/6H8F53GPQ

@deldesir: Any idea what's happening?

JFYI: I downloaded all these videos on CMD with yt-dlp.

holta commented 3 weeks ago

@deldesir

Can you confirm?

avni commented 3 weeks ago

iiab-diagnostics: https://dpaste.com/7XBX3XZE2

I can replicate the error, and see, "failed to download [download] ... does not pass filter (live_status=?not_live); skipping" when downloading:

  1. https://www.youtube.com/watch?v=rbEsoe8F-l4
  2. https://www.youtube.com/live/rbEsoe8F-l4

@deldesir believes that an initial hack/workaround should confirm the path forward, using yt-dlp options like...

It's not clear to me what the workaround is or what I need to do to test? Is the ask to use the command line and testing the download with yt-dlp?

Image

holta commented 3 weeks ago

@deldesir should outline a suggested path forward within 24h:

avni commented 3 weeks ago

As mentioned above, downloading "formerly live" videos via yt-dlp seems to work fine, no options/flags required.

Here's the output from my terminal as confirmation. I don't know how to view the webm files, but presuming they succeeded based on the output.

ubuntu@box: $ sudo yt-dlp https://www.youtube.com/watch?v=rbEsoe8F-l4 [youtube] Extracting URL: https://www.youtube.com/watch?v=rbEsoe8F-l4 [youtube] rbEsoe8F-l4: Downloading webpage [youtube] rbEsoe8F-l4: Downloading ios player API JSON [youtube] rbEsoe8F-l4: Downloading m3u8 information [info] rbEsoe8F-l4: Downloading 1 format(s): 247+251 [download] Destination: LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].f247.webm [download] 100% of 541.37MiB in 00:01:31 at 5.90MiB/s [download] Destination: LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].f251.webm [download] 100% of 126.31MiB in 00:00:47 at 2.67MiB/s [Merger] Merging formats into "LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].webm" Deleting original file LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].f251.webm (pass -k to keep) Deleting original file LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].f247.webm (pass -k to keep) ubuntu@box:~$ ls 'LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].webm'

ubuntu@box: $ sudo yt-dlp https://www.youtube.com/live/rbEsoe8F-l4 [youtube] Extracting URL: https://www.youtube.com/live/rbEsoe8F-l4 [youtube] rbEsoe8F-l4: Downloading webpage [youtube] rbEsoe8F-l4: Downloading ios player API JSON [youtube] rbEsoe8F-l4: Downloading m3u8 information [info] rbEsoe8F-l4: Downloading 1 format(s): 247+251 [download] LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].webm has already been downloaded ubuntu@box: $ rm 'LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].webm' rm: remove write-protected regular file 'LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].webm'? yes ubuntu@box:~$ sudo yt-dlp https://www.youtube.com/live/rbEsoe8F-l4 [youtube] Extracting URL: https://www.youtube.com/live/rbEsoe8F-l4 [youtube] rbEsoe8F-l4: Downloading webpage [youtube] rbEsoe8F-l4: Downloading ios player API JSON [youtube] rbEsoe8F-l4: Downloading m3u8 information [info] rbEsoe8F-l4: Downloading 1 format(s): 247+251 [download] Destination: LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].f247.webm [download] 100% of 541.37MiB in 00:00:19 at 28.26MiB/s [download] Destination: LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].f251.webm [download] 100% of 126.31MiB in 00:00:19 at 6.53MiB/s [Merger] Merging formats into "LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].webm" Deleting original file LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].f247.webm (pass -k to keep) Deleting original file LE DEBAT 24 MAI 2024 [rbEsoe8F-l4].f251.webm (pass -k to keep)

holta commented 3 weeks ago

Progress brewing!

holta commented 3 weeks ago

@deldesir

1) Please confirm/clarify the question @nzola surfaced 3 days ago β€” isn't the real problem here programmatically identifying "actually live" versus "formerly live" YouTube videos?

2) How will downloading work with "actually live" YouTube videos (i.e. whose recording is not yet complete) β€” e.g. as outlined in the 12-hour tennis match question at https://github.com/chapmanjacobd/library/pull/41#issuecomment-2186701059 ?

(Thanks much for clarifying assumptions, intuition, intentions ~ so the end goal is very clear to everyone here!)

deldesir commented 3 weeks ago
  1. Please confirm/clarify the question @nzola surfaced 3 days ago β€” isn't the real problem here programmatically identifying "actually live" versus "formerly live" YouTube videos?

You mean the live_status of the video, whether it's live or was_live? Absolutely not. The real problem is because the live_status column in xklb-metadata.db doesn't have the value not_live. The match_filter from xklb in https://github.com/chapmanjacobd/library/blob/e9975b07bca5b481aeed9398d0bc0adb3b9b25c8/xklb/createdb/tube_backend.py#L418 forces yt-dlp to reject/skip the download because the video doesn't have this not_live status/criterion.

  1. How will downloading work with "actually live" YouTube videos (i.e. whose recording is not yet complete) β€” e.g. as outlined in the 12-hour tennis match question at Add --live optionΒ chapmanjacobd/library#41 (comment) ?

Once chapmanjacobd/library#41 is merged, if ever it is approved, live videos will download on IIAB Calibre-Web. Additional testing will be needed to ensure performance is not overly affected by these generally huge recordings.

holta commented 3 weeks ago

@deldesir let me reframe / re-ask my question from https://github.com/chapmanjacobd/library/pull/41#issuecomment-2186701059 more directly:

holta commented 3 weeks ago

CAVEAT: I concede that some people who click on "Download to IIAB" don't mind extremely slow downloads of "actually live" videos.

holta commented 3 weeks ago

Additional testing will be needed to ensure performance is not overly affected by these generally huge recordings.

Yes that issue of overweight videos is very important. And very well known since 2023[*]. But please let's declare that separate question (of overly long duration, bandwidth-heavy, disk-heavy, and RAM/memory-heavy videos) to be off-topic for now β€” at least in this particular context here: πŸ˜‰

[*] ASIDE: Hopefully to be solved in a few short months, or possibly much earlier!

holta commented 3 weeks ago

ALSO: Pre-announced / Pre-scheduled / Upcoming videos ("not yet live" !) are a whole other category of YouTube URL's...

...that I didn't realize also matter!

(Presumably these too are commonly categorized as "live" videos, even though no video whatsoever exists yet!?)

▢️ Certainly we need some kind of intelligent user-facing warning or messaging β€” in "Tasks" view or similar β€” to warn teachers / parents when they're trying to "Download to IIAB" a video that doesn't yet exist! ⏳

▢️ Many examples of "not yet live" (UPCOMING) and "truly live" (LIVE) YouTube videos below β€” usable as test cases to ensure the "Download to IIAB" button operates cleanly for all:

https://youtube.com/channel/UC4R8DWoMoI7CAwX8_LjQHig

holta commented 2 weeks ago

3 kinds of "ostensibly live" YouTube videos tested here, thanks to @deldesir:

With PR #199 now merged, a follow-up PR is now needed to clean up, as described here:

nzola commented 2 weeks ago

@nzola experienced many such errors.

Videos that were once live appear to be erroneously blocked by IIAB Calibre-Web.

Example of a video that should be downloading, but fails to download:

IMG-20240615-WA0003~2

@nzola mentioned:

Downloaded these playlists [and] thumbnails without any problems: https://www.youtube.com/channel/UCX9j__vYOJu00iqBrCzecVw https://www.youtube.com/playlist?list=PL1mP_vkqPB7EsIqqfwcGsg2rQNzoVy0mk

But cannot download the following single videos: https://www.youtube.com/watch?v=BK0XGf20l84 https://www.youtube.com/watch?v=VCM8tg_mGSw https://www.youtube.com/watch?v=w8snrdaoTUs&t=2s https://www.youtube.com/watch?v=5BO9nhtF0Cc https://www.youtube.com/watch?v=rbEsoe8F-l4&t=7788s https://www.youtube.com/watch?v=Drec4XAMJzI&t=6737s https://www.youtube.com/watch?v=w8snrdaoTUs&t=7s

VM's iiab-diagnostics: https://dpaste.com/6H8F53GPQ

@deldesir: Any idea what's happening?

The playlist https://www.youtube.com/@TOPCONGOFM/playlists downloaded from 1 to 100% then it failed PUBLISHING TO URL... https://dpaste.com/4HVB2SFVK

image

holta commented 2 weeks ago

playlist https://www.youtube.com/@TOPCONGOFM/playlists downloaded from 1 to 100% then it failed

@nzola the above is not a playlist. It is a list of playlists. Can you try again with one of its individual playlists? For example, maybe start with their 8-video playlist https://youtube.com/playlist?list=PLi6Z1Wvj99SKPHMiqLkREMQrMyZy1YTPE ?

@deldesir can you please clean up the error message so that @nzola and others have a crystal clear error message β€” instead of, or alongside Metadata Fetch: [URL] failed: unsupported operand type(s) for /: 'NoneType' and 'int' ? (We can open this message in a new ticket if that makes things easier!)

Conversely: If a list of playlists can easily in future be downloaded into a single bookshelf, much like we do with a YouTube channel (a YouTube channel is basically also a list of playlists!) then we might consider that functionality, if it's genuinely needed?

holta commented 2 weeks ago

@nzola and AFTER you've completed testing of individual playlist(s):

Try the much more ambitious experiment of downloading the entire channel β€” e.g. try the "Download to IIAB" button with URLs like:

nzola commented 2 weeks ago

Downloaded this playlist:https://www.youtube.com/playlist?list=PLi6Z1Wvj99SLJRvpu2S6CPRHvECynHS0y (99 videos) with pi400 Calibre-Web. Successfully completed after 8 hours. PUBLISHING TO URL... https://dpaste.com/3LGEYAQLE image

This error happened during the download, but the process continued until all 99 videos were completed download.

image image

nzola commented 2 weeks ago

@deldesir @holta pi4 Calibre-Web also downloaded this playlist: https://www.youtube.com/playlist?list=PLi6Z1Wvj99SKvAGNRiDmjhgQEZ8pU-4Y2 very smoothly without any problems PUBLISHING TO URL... https://dpaste.com/H37ZJTM7R

nzola commented 2 weeks ago

@nzola and AFTER you've completed testing of individual playlist(s):

Try the much more ambitious experiment of downloading the entire channel β€” e.g. try the "Download to IIAB" button with URLs like:

@deldesir @holta I tried downloading the 2 above video links on both pi4 and multipass Calibre-Web, but they are still stuck on STARTED. pi4 Calibre-Web results: PUBLISHING TO URL... https://dpaste.com/H37ZJTM7R

image

multipass Calibre-Web results: PUBLISHING TO URL... https://dpaste.com/GFTD2QH9D

image

holta commented 2 weeks ago

Downloaded this playlist:https://www.youtube.com/playlist?list=PLi6Z1Wvj99SLJRvpu2S6CPRHvECynHS0y (99 videos) with pi400 Calibre-Web. Successfully completed after 8 hours. PUBLISHING TO URL... https://dpaste.com/3LGEYAQLE [SCREENSHOT OF SUCCESSES] This error happened during the download, but the process continued until all 99 videos were completed download. [SCREENSHOTS OF FAILURE]

holta commented 2 weeks ago

@deldesir were channel downloads like this working in the past?

@deldesir @holta I tried downloading the 2 above video links on both pi4 and multipass Calibre-Web, but they are still stuck on STARTED. pi4 Calibre-Web results: PUBLISHING TO URL... https://dpaste.com/H37ZJTM7R [SCREENSHOT OF "Metadata Fetch" THAT GETS STUCK "Waiting" ON RPI 4]

multipass Calibre-Web results: PUBLISHING TO URL... https://dpaste.com/GFTD2QH9D [SCREENSHOT OF "Metadata Fetch" THAT GETS STUCK "Waiting" ON 24.04 VM]

holta commented 2 weeks ago

@deldesir were channel downloads like this working in the past?

@EMG70 wonders if part of the reason is...

Download stuck on "Waiting "I thought this has been normal behaviour when one tries to download another video before the first one has started or finished gathering meta data.

The long wait happens mostly on channels or extremely long playlists such that any attempt to download a second url will give the "waiting" status

(@deldesir can you help clarify?!)

deldesir commented 2 weeks ago

This error happened during the download, but the process continued until all 99 videos were completed download.

Very ugly. Likely a network error.

deldesir commented 2 weeks ago

@deldesir were channel downloads like this working in the past?

We did tests with channels in the past but not big as https://youtube.com/@TOPCONGOFM. This channel has 4.1k videos. Even though only 100 videos will be downloaded, the first task will take a long time processing metadata for all those 4.1k videos and sorting them out based on their views per year.

deldesir commented 2 weeks ago

The long wait is justified as I could see all playlists being indexed running tail -f /var/log/xklb.log. Test done in lrn2. I had to reboot the machine. I guess I'll have to move the database aside too because a lot of residual videos will be caught in subsquent downloads.

holta commented 2 weeks ago

@EMG70 further explains https://github.com/iiab/calibre-web/issues/188#issuecomment-2200213122

Just to be sure we are not investigating normal behaviour I initiated a 3hr video download, before it completed gathering meta I started a 16min video download. This resulted in second video going to waiting mode and only started after first video was completed as what's happening on @nzola's channel downloads.

image

image

deldesir commented 2 weeks ago

@EMG70 explained it well. Tasks are run sequentially. This is the normal behavior.

I was investigating the channel having a "started" message with a progress stuck at 0%.

nzola commented 2 weeks ago

I tried to download this video link: https://youtu.be/Zm-ROp6EMZo as requested by @holta on multipass iiab, pi4 iiab and pi400 iiab. Here are the results:

multipass iiab: PUBLISHING TO URL... https://dpaste.com/3R8RXUUZ2 image

pi4: PUBLISHING TO URL... https://dpaste.com/2VXWWLBZB image

pi400: PUBLISHING TO URL... https://dpaste.com/8Z4TA84FH image

holta commented 2 weeks ago

Thanks @nzola! Let's ask @deldesir to try to fix the situation in coming days if he can!

Tangentially related: