iiab / calibre-web

:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database
GNU General Public License v3.0
3 stars 4 forks source link

Why ~100 minute ETA to complete "Metadata Fetch" stage (seems very slow) for a playlist of 35 short videos? [Metadata Fetch: ... failed: unsupported operand type(s) for /: 'NoneType' and 'int'] [hundreds of videos stuck in xklb-metadata.db, painstakingly processed every time, but never downloading?] #189

Open holta opened 3 weeks ago

holta commented 3 weeks ago

Trying to download playlist https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M on LRN2 is going extremely slowly. The "Progress" for "Metadata Fetch" has increased by 1 percentage point every minute or so, as if this initial stage alone will take about 100 minutes:

image

The playlist's 35 videos are each about 5-12min long.

Any idea why the "Metadata Fetch" stage is going so slowly? Hopefully the /var/log/xklb.log logs below provide hints to help explain?

root@lrn2:~# tail /var/log/calibre-web.log
[2024-06-17 08:46:27,854]  INFO {cps.web:1230} Serving book: Precious Plastic - Setup a local collectio - One Army
[2024-06-17 09:15:06,680] DEBUG {cps.web:1601} Start video watching for 866
[2024-06-17 09:15:06,724]  INFO {cps.web:1230} Serving book: FORM 1 MATHEMATICS LESSON 30 DESTRIBUTION - ELIMU TV
[2024-06-17 09:48:08,431] DEBUG {cps.web:1665} Selected book is unavailable. File does not exist or is not accessible
[2024-06-17 10:27:08,741] DEBUG {cps.web:1601} Start video watching for 1022
[2024-06-17 10:27:08,792]  INFO {cps.web:1230} Serving book: Precious Plastic - Setup a mix plastic wor - One Army
[2024-06-17 10:31:08,977] DEBUG {cps.services.worker:91} Add Task for user: Admin - Metadata fetch task for https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M
[2024-06-17 10:31:08,993]  INFO {cps.tasks.metadata_extract:145} Starting to fetch metadata for URL: https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M
[2024-06-17 10:31:16,062]  INFO {cps.editbooks:385} Received metadata request: ImmutableMultiDict([('current_user_name', 'Admin'), ('shelf_title', 'Calculus')])
[2024-06-17 10:31:16,113]  INFO {cps.editbooks:374} Shelf Calculus created

root@lrn2:~# tail -3900 /var/log/xklb.log | pastebinit -b dpaste.com https://dpaste.com/B8FWRG27Y

root@lrn2:~# systemctl status calibre-web
● calibre-web.service - Calibre-Web
     Loaded: loaded (/etc/systemd/system/calibre-web.service; enabled; preset: enabled)
     Active: active (running) since Sun 2024-06-16 06:47:55 CDT; 1 day 4h ago
   Main PID: 535 (python3)
      Tasks: 12 (limit: 9453)
     Memory: 424.1M (peak: 1.1G)
        CPU: 2h 4min 37.963s
     CGroup: /system.slice/calibre-web.service
             ├─  535 /usr/local/calibre-web-py3/bin/python3 /usr/local/calibre-web-py3/cps.py -p /library/calibre-web/config/app.db
             ├─84680 /bin/bash /usr/local/bin/lb-wrapper tubeadd "https://www.youtube.com/watch?v=UoxqrUY0mzU"
             ├─84691 /bin/bash /usr/local/bin/lb-wrapper tubeadd "https://www.youtube.com/watch?v=UoxqrUY0mzU"
             ├─84692 /bin/bash /usr/local/bin/lb-wrapper tubeadd "https://www.youtube.com/watch?v=UoxqrUY0mzU"
             ├─84693 /bin/bash /usr/local/bin/lb-wrapper tubeadd "https://www.youtube.com/watch?v=UoxqrUY0mzU"
             └─84694 /root/.local/pipx/venvs/xklb/bin/python /usr/local/bin/lb tubeadd /library/calibre-web/xklb-metadata.db "https://www.youtube.com/watch?v=UoxqrUY0mzU" --force -vv

Jun 16 06:47:55 lrn2 systemd[1]: Started calibre-web.service - Calibre-Web.
holta commented 3 weeks ago

Just FYI the "Metadata Fetch" stage has now been running for more than an hour, and its Progress column currently shows 70%.

holta commented 3 weeks ago

The "Metadata Fetch" stage failed after ~86 minutes — is this related to an earlier video stuck in some queue possibly?

Screenshot_20240617-131555

Possibly related:

holta commented 3 weeks ago

Just FYI regardless of the root cause for the (failed) ~86 minute "Metadata Fetch", videos that keep reappearing in "Tasks" view (e.g. https://www.youtube.com/watch?v=n_iRE9alO44) should be handled, to explain what's happening.

FWIW in this one particular case, browsing to https://www.youtube.com/watch?v=n_iRE9alO44 shows "This live stream recording is not available.", "No views" and "Streamed 1 year ago" — error messages that in an ideal world might become more visible via "Tasks" view?

1) @deldesir in the short term, can you suggest a manual hack to clear out problematic videos that are "trapped" and keep reappearing in "Tasks" view? 2) Can we delineate or document what/all kinds of problematic videos keep reappearing in "Tasks" view? (Regardless whether these are merely "visual clutter" — or more serious clogging / blocking of other downloads!) 3) As soon as we can, educators/families will need clear explanations of what went wrong. (With actionable hints/suggestions in "Tasks" view, to shine a bright light on emerging problems in jargon-free language, wherever we can anyway!)

deldesir commented 3 weeks ago

Can I have access to your xklb-metadata? https://www.youtube.com/watch?v=n_iRE9alO44 looks out-of-norm and might not have its error recorded in media.error.

UPDATE: Please ignore the request. I'll try to download it myself.

holta commented 3 weeks ago

suggest a manual hack to clear out problematic videos that are "trapped" and keep reappearing in "Tasks" view?

Just FYI installing the very latest IIAB Calibre-Web onto LRN2 did not help:

Trying to "Download to IIAB" https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M is currently working through a ~100 minute "Metadata Fetch".

root@lrn2:~# tail -700 /var/log/xklb.log | pastebinit -b dpaste.com https://dpaste.com/8C28P5R3V

holta commented 3 weeks ago

@deldesir: at first I suspected Calculus playlist https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M might be pulling in more than its official 35 videos?

As if you watch the running of tail -f /var/log/xklb.log growing, you'll see what appears to be many hundreds of videos being polled (a new one every 5-10 seconds) during "Metadata Fetch" — example:

2024-06-17 18:29:32 - [Debug] yt-dlp initialized 0.2313
2024-06-17 18:29:32 - [Info] [youtube] Extracting URL: https://www.youtube.com/watch?v=45lckolNRm8
2024-06-17 18:29:32 - [Info] [youtube] 45lckolNRm8: Downloading webpage
2024-06-17 18:29:33 - [Info] [youtube] 45lckolNRm8: Downloading ios player API JSON
2024-06-17 18:29:36 - [Debug] Importing playlist-less media https://www.youtube.com/watch?v=45lckolNRm8
2024-06-17 18:29:36 - [Debug] media.playlist_media_add 3.7105
2024-06-17 18:29:36 - [Info]
2024-06-17 18:29:36 - [Debug] ydl.extract_info done 0.0167
2024-06-17 18:29:36 - [Info] lb-wrapper's xklb command (tubeadd) completed successfully.
2024-06-17 18:29:37 - [Info] Using yt-dlp 2024.05.27
2024-06-17 18:29:37 - [Info] Running xklb command: lb tubeadd /library/calibre-web/xklb-metadata.db https://www.youtube.com/watch?v=hUiIkiQGiIM --force -vv
2024-06-17 18:29:38 - [Debug] library v2.8.065
2024-06-17 18:29:38 - [Debug] ['/usr/local/bin/lb', 'tubeadd', '/library/calibre-web/xklb-metadata.db', 'https://www.youtube.com/watch?v=hUiIkiQGiIM', '--force', '-vv']
2024-06-17 18:29:38 - [Debug] {'force': True, 'paths': ['https://www.youtube.com/watch?v=hUiIkiQGiIM']}
2024-06-17 18:29:38 - [Debug] {'noprogress': True, 'skip_download': True, 'lazy_playlist': True, 'extract_flat': True, 'no_check_certificate': True, 'ignore_no_formats_error': True, 'skip_playlist_after_errors': 21, 'playlistend': 20000, 'force': True}

But that theory proved false: as https://www.youtube.com/watch?v=45lckolNRm8 is "PARLONS EN Les études à l’étranger. Expérience des anciens étudiants congolais en Chine" from YouTube channel TOP CONGO FM — it seems these hundreds of videos must somehow be stuck within IIAB Calibre-Web — it keeps trying to reprocess or redownload them somehow...

deldesir commented 3 weeks ago

A quick way to confirm this is to use a clean xklb-metadata.db.

holta commented 3 weeks ago

A quick way to confirm this is to use a clean xklb-metadata.db.

Good idea. Feel free to run...

systemctl stop calibre-web
cd /library/calibre-web
mv xklb-metadata.db xklb-metadata.db.OLD2
systemctl start calibre-web

...and then try again to "Download to IIAB" https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M on LRN2?

holta commented 3 weeks ago

"Minimal Reproducer" URL?

Any Minimal Reproducer URL that "pollutes" xklb-metadata.db, would be incredibly useful for testing — and to help us all understand this better — if we can identify one or several such URLs! :pray:

deldesir commented 3 weeks ago

I confirm "lb tubeadd" is taking longer to process the videos. I don't see any pollution with irrelevant videos.

Update: pollution happening for me with 3-videos playlist https://www.youtube.com/playlist?list=PLJicmE8fK0Eg8cBExYSr9r3tzg9ZQo3gD. Using a clean db is a temporary measure to get around this.