iiab / calibre-web

:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database
GNU General Public License v3.0
4 stars 5 forks source link

@EMG70's YouTube playlist downloading problems 2024-06-10: (1) (This is taking longer than expected) (2) Metadata Fetch: [YouTube video URL] failed: 'NoneType' object cannot be interpreted as an integer [@avni results: (3) failed to download: None (4) it keeps trying to redownload failed videos] #178

Open holta opened 5 months ago

holta commented 5 months ago
          Tested on a fresh VM 

SUDO IIAB-DIAGNOSTICS : https://dpaste.com/D9HL627NJ The following were tested with varying results. https://www.youtube.com/playlist?list=PLfiHW0cuXSvSiVuHqtO4IXMqWSrHou77B ⚠️ 5/6 videos downloaded OK.Failed video's response "This is taking longer than expected" This failed video https://www.youtube.com/watch?v=7qiwh4ybglo was tried individually and failed with "object cannot be interpreted as integer" Screenshot from 2024-06-10 20-29-52

https://www.youtube.com/playlist?list=PLfiHW0cuXSvQ0s-oNi2syJNAPTN8gt4oi ❌ playlist of 29 videos failed https://www.youtube.com/playlist?list=PLfiHW0cuXSvQSZHsA5zpo3JdrY5EJp1yThttps://www.youtube.com/watch?v=iaQV0tehds4https://www.youtube.com/playlist?list=PLbeRJIjdJ7QEuVxTNfQsPBpUv4UcQdAzJ ❌ short playlist with 3 videos failed.

Originally posted by @EMG70 in https://github.com/iiab/calibre-web/issues/172#issuecomment-2159159800

holta commented 5 months ago

@deldesir

Are these 2 other issues related — also with the 'NoneType' object cannot be interpreted as an integer error?

holta commented 5 months ago

@deldesir

Please see the Python errors between Line 1082 and Line 1184 here:

-rw-r--r-- 1 root root 30337 Jun 10 20:27 /var/log/calibre-web.log
                        ...ITS LAST 100 LINES FOLLOW...

  File "/usr/local/calibre-web-py3/cps/tasks/download.py", line 123, in run
    self.message = f"{self.media_url_link} failed to download: {self.read_error_from_database()}"
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/calibre-web-py3/cps/tasks/download.py", line 139, in read_error_from_database
    error = conn.execute("SELECT error FROM media WHERE webpath = ?", (self.media_url,)).fetchone()[0]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: no such column: error
[2024-06-10 20:12:22,588] DEBUG {cps.services.worker:91} Add Task for user: Admin - Metadata fetch task for https://www.youtube.com/playlist?list=PLfiHW0cuXSvQ0s-oNi2syJNAPTN8gt4oi
[2024-06-10 20:12:22,589]  INFO {cps.tasks.metadata_extract:131} Starting to fetch metadata for URL: https://www.youtube.com/playlist?list=PLfiHW0cuXSvQ0s-oNi2syJNAPTN8gt4oi
[2024-06-10 20:12:25,447]  INFO {cps.editbooks:385} Received metadata request: ImmutableMultiDict([('current_user_name', 'Admin'), ('shelf_title', 'Physics Form 1')])
[2024-06-10 20:12:25,459]  INFO {cps.editbooks:374} Shelf Physics Form 1 created
[2024-06-10 20:13:09,080] ERROR {cps.tasks.metadata_extract:113} An error occurred during the calculation of views per day for https://www.youtube.com/watch?v=DMTQUGYoHHQ: 'NoneType' object cannot be interpreted as an integer
[2024-06-10 20:13:09,081] ERROR {cps.services.worker:202} 'views_per_day'
Traceback (most recent call last):
  File "/usr/local/calibre-web-py3/cps/services/worker.py", line 199, in start
    self.run(*args)
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 154, in run
    requested_urls = self._sort_and_limit_requested_urls(requested_urls)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 117, in _sort_and_limit_requested_urls
    return dict(sorted(requested_urls.items(), key=lambda item: item[1]["views_per_day"], reverse=True)[:min(MAX_VIDEOS_PER_DOWNLOAD, len(requested_urls))])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 117, in <lambda>
    return dict(sorted(requested_urls.items(), key=lambda item: item[1]["views_per_day"], reverse=True)[:min(MAX_VIDEOS_PER_DOWNLOAD, len(requested_urls))])
                                                                ~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'views_per_day'
[2024-06-10 20:14:02,140] DEBUG {cps.services.worker:91} Add Task for user: Admin - Metadata fetch task for https://www.youtube.com/playlist?list=PLfiHW0cuXSvQSZHsA5zpo3JdrY5EJp1yT
[2024-06-10 20:14:03,096]  INFO {cps.tasks.metadata_extract:131} Starting to fetch metadata for URL: https://www.youtube.com/playlist?list=PLfiHW0cuXSvQSZHsA5zpo3JdrY5EJp1yT
[2024-06-10 20:14:05,944]  INFO {cps.editbooks:385} Received metadata request: ImmutableMultiDict([('current_user_name', 'Admin'), ('shelf_title', 'Maths Form 1')])
[2024-06-10 20:14:05,954]  INFO {cps.editbooks:374} Shelf Maths Form 1 created
[2024-06-10 20:15:56,672] ERROR {cps.tasks.metadata_extract:113} An error occurred during the calculation of views per day for https://www.youtube.com/watch?v=DMTQUGYoHHQ: 'NoneType' object cannot be interpreted as an integer
[2024-06-10 20:15:56,672] ERROR {cps.tasks.metadata_extract:113} An error occurred during the calculation of views per day for https://www.youtube.com/watch?v=IeJjQwdbUnw: 'NoneType' object cannot be interpreted as an integer
[2024-06-10 20:15:56,673] ERROR {cps.services.worker:202} 'views_per_day'
Traceback (most recent call last):
  File "/usr/local/calibre-web-py3/cps/services/worker.py", line 199, in start
    self.run(*args)
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 154, in run
    requested_urls = self._sort_and_limit_requested_urls(requested_urls)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 117, in _sort_and_limit_requested_urls
    return dict(sorted(requested_urls.items(), key=lambda item: item[1]["views_per_day"], reverse=True)[:min(MAX_VIDEOS_PER_DOWNLOAD, len(requested_urls))])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 117, in <lambda>
    return dict(sorted(requested_urls.items(), key=lambda item: item[1]["views_per_day"], reverse=True)[:min(MAX_VIDEOS_PER_DOWNLOAD, len(requested_urls))])
                                                                ~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'views_per_day'
[2024-06-10 20:20:56,562] DEBUG {cps.services.worker:91} Add Task for user: Admin - Metadata fetch task for https://www.youtube.com/watch?v=iaQV0tehds4
[2024-06-10 20:20:56,740]  INFO {cps.tasks.metadata_extract:131} Starting to fetch metadata for URL: https://www.youtube.com/watch?v=iaQV0tehds4
[2024-06-10 20:21:01,220] DEBUG {cps.services.worker:91} Add Task for user: Admin - Download task for https://www.youtube.com/watch?v=iaQV0tehds4
[2024-06-10 20:21:01,232]  INFO {cps.tasks.download:41} Subprocess args: ['lb-wrapper', 'dl', 'https://www.youtube.com/watch?v=iaQV0tehds4']
[2024-06-10 20:21:12,539]  INFO {cps.editbooks:385} Received metadata request: ImmutableMultiDict([('requested_file', '/library/downloads/calibre-web/Youtube/Foundations For Farming Zimbabwe/Want to learn how to increase your crop yield to 14 tonnes per HaM-oM-<M-^__105.00_[iaQV0tehds4].mp4'), ('current_user_name', 'Admin')])
[2024-06-10 20:21:12,540]  INFO {cps.editbooks:387} Requested file: /library/downloads/calibre-web/Youtube/Foundations For Farming Zimbabwe/Want to learn how to increase your crop yield to 14 tonnes per HaM-oM-<M-^__105.00_[iaQV0tehds4].mp4
[2024-06-10 20:21:12,540]  INFO {cps.editbooks:392} Processing file: <_io.BufferedReader name='/library/downloads/calibre-web/Youtube/Foundations For Farming Zimbabwe/Want to learn how to increase your crop yield to 14 tonnes per HaM-oM-<M-^__105.00_[iaQV0tehds4].mp4'>
[2024-06-10 20:21:12,540] DEBUG {cps.uploader:374} Temporary file: /tmp/calibre_web/c141b4032bcb0d6641fee3a9bf95596e
[2024-06-10 20:21:12,565]  WARN {py.warnings:110} /usr/local/calibre-web-py3/cps/editbooks.py:1535: SAWarning: Object of type <Books> not in session, add operation along 'Authors.books' won't proceed (This warning originated from the Session 'autoflush' process, which was invoked automatically in response to a user-initiated operation.)
  db_element = db_session.query(db_object).filter((func.lower(db_filter).ilike(add_element))).first()

[2024-06-10 20:21:12,585] DEBUG {cps.helper:548} Moving title: /tmp/calibre_web/c141b4032bcb0d6641fee3a9bf95596e to /library/calibre-web/Foundations For Farming Zimbabwe/Want to learn how to increase your crop yield to 14 tonnes per Ha_ (6)/Want to learn how to increase your crop yield to 14 tonnes per Ha_ - Foundations For Farming Zimbabwe
[2024-06-10 20:21:12,612]  INFO {cps.tasks.download:106} Successfully sent the requested file to http://192.168.0.212/books/meta
[2024-06-10 20:21:12,625]  INFO {cps.tasks.download:129} Download task for https://www.youtube.com/watch?v=iaQV0tehds4 completed successfully
[2024-06-10 20:21:46,783] DEBUG {cps.services.worker:91} Add Task for user: Admin - Metadata fetch task for https://www.youtube.com/playlist?list=PLfiHW0cuXSvQ0s-oNi2syJNAPTN8gt4oi
[2024-06-10 20:21:46,784]  INFO {cps.tasks.metadata_extract:131} Starting to fetch metadata for URL: https://www.youtube.com/playlist?list=PLfiHW0cuXSvQ0s-oNi2syJNAPTN8gt4oi
[2024-06-10 20:21:49,236]  INFO {cps.editbooks:385} Received metadata request: ImmutableMultiDict([('current_user_name', 'Admin'), ('shelf_title', 'Physics Form 1')])
[2024-06-10 20:21:49,250]  INFO {cps.editbooks:374} Shelf Physics Form 1 (2) created
[2024-06-10 20:23:41,093] ERROR {cps.tasks.metadata_extract:113} An error occurred during the calculation of views per day for https://www.youtube.com/watch?v=DMTQUGYoHHQ: 'NoneType' object cannot be interpreted as an integer
[2024-06-10 20:23:41,094] ERROR {cps.tasks.metadata_extract:113} An error occurred during the calculation of views per day for https://www.youtube.com/watch?v=IeJjQwdbUnw: 'NoneType' object cannot be interpreted as an integer
[2024-06-10 20:23:41,094] ERROR {cps.services.worker:202} 'views_per_day'
Traceback (most recent call last):
  File "/usr/local/calibre-web-py3/cps/services/worker.py", line 199, in start
    self.run(*args)
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 154, in run
    requested_urls = self._sort_and_limit_requested_urls(requested_urls)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 117, in _sort_and_limit_requested_urls
    return dict(sorted(requested_urls.items(), key=lambda item: item[1]["views_per_day"], reverse=True)[:min(MAX_VIDEOS_PER_DOWNLOAD, len(requested_urls))])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 117, in <lambda>
    return dict(sorted(requested_urls.items(), key=lambda item: item[1]["views_per_day"], reverse=True)[:min(MAX_VIDEOS_PER_DOWNLOAD, len(requested_urls))])
                                                                ~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'views_per_day'
[2024-06-10 20:25:16,376] DEBUG {cps.services.worker:91} Add Task for user: Admin - Metadata fetch task for https://www.youtube.com/playlist?list=PLbeRJIjdJ7QEuVxTNfQsPBpUv4UcQdAzJ
[2024-06-10 20:25:17,114]  INFO {cps.tasks.metadata_extract:131} Starting to fetch metadata for URL: https://www.youtube.com/playlist?list=PLbeRJIjdJ7QEuVxTNfQsPBpUv4UcQdAzJ
[2024-06-10 20:25:19,790]  INFO {cps.editbooks:385} Received metadata request: ImmutableMultiDict([('current_user_name', 'Admin'), ('shelf_title', 'Word from Brian')])
[2024-06-10 20:25:19,800]  INFO {cps.editbooks:374} Shelf Word from Brian created
[2024-06-10 20:27:23,994] ERROR {cps.tasks.metadata_extract:113} An error occurred during the calculation of views per day for https://www.youtube.com/watch?v=DMTQUGYoHHQ: 'NoneType' object cannot be interpreted as an integer
[2024-06-10 20:27:23,994] ERROR {cps.tasks.metadata_extract:113} An error occurred during the calculation of views per day for https://www.youtube.com/watch?v=IeJjQwdbUnw: 'NoneType' object cannot be interpreted as an integer
[2024-06-10 20:27:23,995] ERROR {cps.services.worker:202} 'views_per_day'
Traceback (most recent call last):
  File "/usr/local/calibre-web-py3/cps/services/worker.py", line 199, in start
    self.run(*args)
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 154, in run
    requested_urls = self._sort_and_limit_requested_urls(requested_urls)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 117, in _sort_and_limit_requested_urls
    return dict(sorted(requested_urls.items(), key=lambda item: item[1]["views_per_day"], reverse=True)[:min(MAX_VIDEOS_PER_DOWNLOAD, len(requested_urls))])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/calibre-web-py3/cps/tasks/metadata_extract.py", line 117, in <lambda>
    return dict(sorted(requested_urls.items(), key=lambda item: item[1]["views_per_day"], reverse=True)[:min(MAX_VIDEOS_PER_DOWNLOAD, len(requested_urls))])
                                                                ~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'views_per_day'
holta commented 5 months ago

Are these recent cps/tasks/download.py PRs related?

Is this cps/tasks/metadata_extract.py PR related?

holta commented 5 months ago

Now that both these PRs are merged... 💯

...where do we stand, helping @EMG70 move forward here?⚡

EMG70 commented 5 months ago

Now that both these PRs are merged... 💯

* PR [Handle restricted/unavailable videos [and list them in "Tasks" view, when downloading (stale!) playlists] #179](https://github.com/iiab/calibre-web/pull/179)

* PR [Build list of URLs simpler [refactor PR #179 for readability / maintainability] #180](https://github.com/iiab/calibre-web/pull/180)

...where do we stand, helping @EMG70 move forward here?⚡

New VM installed after merging of PRs #179 and #180

SUDO IIAB-DIAGNOSTICS - https://dpaste.com/4U8T9242B

I have attempted to download exactly same videos that failed on 10 /06/24.There is a huge improvement on most of them.Please see screenshots. Screenshot 2024-06-11 at 22-57-35 Internet in a Box Tasks One of the videos gave an unfamiliar long error message which seems to be part of the Calibre-web log . Screenshot from 2024-06-11 23-04-46

holta commented 5 months ago

@deldesir

Why do 3 videos show failed to download: None ?

Can you investigate, and help improve this error message?

(In @EMG70's big screenshot, just above.)

avni commented 5 months ago

Testing June 11, 2024. YouTube Playlist Part 1

iiab-diagnostics: https://dpaste.com/4RDY4S535

Image

avni commented 5 months ago

Testing June 11, 2024. YouTube Playlist Part 2

[Updated]

iiab-diagnostics: https://dpaste.com/4RDY4S535

Image

avni commented 5 months ago

Testing June 11, 2024. YouTube Playlist Part 3

[Updated]

iiab-diagnostics: https://dpaste.com/GRX3DRTYX

Image

avni commented 5 months ago

Testing June 11, 2024. YouTube Playlist Part 4

iiab-diagnostics: https://dpaste.com/CW6GJTBBZ

Image

holta commented 5 months ago

makes me think the system is retrying failed videos periodically

⬆️

@deldesir do you agree?

EMG70 commented 5 months ago

Testing June 11, 2024. YouTube Playlist Part 4

iiab-diagnostics: https://dpaste.com/CW6GJTBBZ

  • Tested the Word from Brian Playlist of 3 videos.
  • The 3 videos in the playlist were downloaded successfully.
  • There is 1 failure listed, but that is the failed video from the original playlist I tried on this IIAB instance: https://www.youtube.com/watch?v=7qiwh4ybglo. This makes me think the system is retrying failed videos periodically.

Image

I agree with this observation,the system seems to retry a previously failed video.

deldesir commented 5 months ago

@deldesir

Why do 3 videos show failed to download: None ?

Can you investigate, and help improve this error message?

(In @EMG70's big screenshot, just above.)

I am pretty sure failed to download: None error refers to "unavailable fragments" error. I could see in my test the video took longer than expected but eventually downloaded.

deldesir commented 5 months ago

makes me think the system is retrying failed videos periodically

⬆️

@deldesir do you agree?

xklb does retry failed downloads, but Calibre-Web no. We don't have this implemented yet but will eventually.

holta commented 5 months ago

I am pretty sure failed to download: None error refers to "unavailable fragments" error. I could see in my test the video took longer than expected but eventually downloaded.

🧩

Possible background: (for others!)

avni commented 5 months ago

xklb does retry failed downloads, but Calibre-Web no. We don't have this implemented yet but will eventually.

If you look at the screenshots I posted, the same YouTube file fails multiple times implying that failed files are being retried automatically. Do you know why that is?

deldesir commented 5 months ago

@avni what happened is this specific video started to download but stopped due to unavailable fragments. This downloads process left residual fragments that are used each time you try to download again. If you remove the video downloads directory and try again, the download will process from start but will be stuck at some point leaving incomplete file again. You wouldn't see any error message because xklb will not report one at this point, thus the "None" retrieved from database. This is an issue I was trying to address in #157 because each failed video must be accompanied with an error message explaining what happened.

avni commented 5 months ago

@deldesir fascinating. I don't know the inner workings of the system but am curious to understand more. For example, is there a downloads file for each video you download or just one for all downloads. If the latter, I wonder if you can just remove those fragments from the downloads directory for any files that fail. Very cool you are diving deep into this in #157. Thank you for the thorough explanation. 🙏

deldesir commented 5 months ago

It's the latter. I'll resume work on it ASAP.

holta commented 5 months ago

It's the latter. I'll resume work on it ASAP.

Awesome.

And Advance Apologies to everyone — that bug fixes are urgent yes — but take time to be architected properly ✊