iiab / calibre-web

:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database
GNU General Public License v3.0
4 stars 5 forks source link

Thumbnail of ["Downloaded to IIAB"] video is cut off & many files fail to be copied to /library/calibre-web [intermittent failures?] #77

Open avni opened 11 months ago

avni commented 11 months ago

Describe the bug/problem I downloaded a YouTube video through the IIAB [Calibre-Web] interface. The video [downloaded to IIAB] and plays successfully locally but the [thumbnail] is cutoff. See attached image.

To Reproduce Steps to reproduce the behavior:

  1. Go to http://box/books/?data=root&sort_param=stored
  2. Click on 'Download to IIAB'
  3. Download video: https://www.youtube.com/watch?v=ehSIhj2-qPM
  4. Go to: http://box/books/?data=root&sort_param=stored and in the Discover (Random Books) section, you will see the video linked with the thumbnail cut off. Screenshot attached.

Logfile http://sprunge.us/XhS2Y3

Expected behavior For step 4 above, I expect the full thumbnail to show.

Screenshots

Screenshot 2023-12-23 at 2 48 07 PM

Environment (please complete the following information):

holta commented 11 months ago

@EMG70 and @deldesir can you reproduce this "cover/thumbnail/poster cut off" failure?

Related:

avni commented 11 months ago

Here is the full image for reference

Screenshot 2023-12-23 at 2 51 52 PM
holta commented 11 months ago

Install instructions extensively clarified (e.g. common pitfalls, macOS clarifications, etc) thanks to @avni's help:

https://github.com/iiab/calibre-web/wiki#wrench-installation

EMG70 commented 11 months ago

@EMG70 and @deldesir can you reproduce this "cover/thumbnail/poster cut off" failure?

Related:

* PR [Make covers square #2](https://github.com/iiab/calibre-web/pull/2)

* PR [Fix missing cover [for MP4; not just WebM] #59](https://github.com/iiab/calibre-web/pull/59)

* PR [Find thumbnail [a.k.a. cover/poster. For MP4; not just WebM] #65](https://github.com/iiab/calibre-web/pull/65)

* PR [Find webp file [thumbnail/cover/poster for MP4; not just WebM ?] #69](https://github.com/iiab/calibre-web/pull/69)

* PR [Support PNG, WEBP, GIF, JPG covers/thumbnails #72](https://github.com/iiab/calibre-web/pull/72)

I have not had been able to reproduce @avni poster cut off.

pastebinit -b sprunge.us /var/log/calibre-web.log http://sprunge.us/vgBCpW?en pastebinit -b sprunge.us /var/log/xklb.log http://sprunge.us/Z3heBF?en

see screenshots exactly as @avni Screenshot from 2023-12-24 00-59-01 Screenshot from 2023-12-24 01-00-13

holta commented 11 months ago

@avni others have not been able to reproduce your failure:

multipass launch 22.04 -m 2G -d 20G --cloud-init omg.yml

Possibly related intermittent Vimeo glitch yesterday:

holta commented 11 months ago

Just FYI here's a 2nd VM (24.04 fresh install I just created) that was unable to reproduce this #77 "thumbnail cut off" glitch. Mobile View here:

Screenshot_20231223-212850

iiab-diagnostics: http://sprunge.us/q65mos

avni commented 11 months ago

When I download the same URL - it doesn't show up on the main books page, a different URL (https://www.youtube.com/watch?v=1tBczesm0Ys) also doesn't. I'll try on a new IIAB instance.

NB: The thumbnail you uploaded to this issue looks stretched.

Screenshot 2023-12-24 at 8 03 42 PM Screenshot 2023-12-24 at 8 03 51 PM
avni commented 11 months ago

For debugging.

root@box:~# tree /library/downloads/calibre-web/
/library/downloads/calibre-web/
├── Youtube
│   ├── Himalayan_Trust
│   │   ├── Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_506.00_[ehSIhj2-qPM].mp4
│   │   ├── Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_510.00_[ehSIhj2-qPM].mp4
│   │   └── Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_510.00_[ehSIhj2-qPM].webp
│   └── World_Bank
│       ├── Haiti_-_Education_pour_tous_1.48k_[1tBczesm0Ys].jpg
│       └── Haiti_-_Education_pour_tous_1.48k_[1tBczesm0Ys].mp4
├── survey.db
├── survey.db.2023-12-24_19:53:53_EST
├── survey.db.2023-12-24_19:54:27_EST
├── survey.db.2023-12-24_19:55:05_EST
└── survey.db.2023-12-24_19:59:34_EST
3 directories, 10 files

root@box:~# tree /library/calibre-web/
/library/calibre-web/
├── Himalayan Trust
│   └── Bringing technological literacy to Solukhumbu with Himalayan Trust Nepal and Edutech (1)
│       ├── Bringing technological literacy to Solukhu - Himalayan Trust.mp4
│       └── cover.jpg
├── config
│   └── app.db
├── metadata.db
└── metadata_db_prefs_backup.json
3 directories, 5 files
holta commented 11 months ago

Screenshot_20231224-213526~2

@deldesir what's happening above?

Any idea why 2 different videos were downloaded from the exact same YouTube URL?

@avni can you run the following and paste in output?

apt install mediainfo
cd /library/downloads/calibre-web/Youtube/Himalayan_Trust
mediainfo *506*mp4
mediainfo *510*mp4
avni commented 11 months ago

Any idea why 2 different videos were downloaded from the exact same YouTube URL?

This may be because I selected 1080p under Video Quality for one try just as a test to see if it would make a difference.

avni commented 11 months ago
root@box:/library/downloads/calibre-web/Youtube/Himalayan_Trust# mediainfo *510*mp4
General
Complete name                            : Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_510.00_[ehSIhj2-qPM].mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 24.7 MiB
Duration                                 : 5 min 37 s
Overall bit rate                         : 614 kb/s
Movie name                               : Bringing technological literacy to Solukhumbu with Himalayan Trust Nepal and Edutech
Performer                                : Himalayan Trust
Description                              : The Himalayan Trust has joined Edutech Nepal and the Himalayan Trust Nepal in forming 7 new computer labs in Solukhumbu. Himalayan Trust has put forward $25,000 NZD (50% of the budget) towards these 7 labs and is looking forward to working on other projects throughout Solukhumbu alongside Edutech Nepal and HTN.  /  / Computer literacy is an essential part of any education in 2022 and is becoming a core part of fulfilling the Himalayan Trusts mission of bringing quality education to all of Solukhumbu.
Recorded date                            : 20220323
Writing application                      : Lavf58.76.100
Comment                                  : https://www.youtube.com/watch?v=ehSIhj2-qPM
LongDescription                          : The Himalayan Trust has joined Edutech Nepal and the Himalayan Trust Nepal in forming 7 new computer labs in Solukhumbu. Himalayan Trust has put forward $25,000 NZD (50% of the budget) towards these 7 labs and is looking forward to working on other projects throughout Solukhumbu alongside Edutech Nepal and HTN.  /  / Computer literacy is an essential part of any education in 2022 and is becoming a core part of fulfilling the Himalayan Trusts mission of bringing quality education to all of Solukhumbu.

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Main@L3
Format settings                          : CABAC / 3 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 3 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 5 min 37 s
Bit rate                                 : 480 kb/s
Width                                    : 640 pixels
Height                                   : 360 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 25.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.083
Stream size                              : 19.3 MiB (78%)
Title                                    : ISO Media file produced by Google Inc. Created on: 03/22/2022.
Writing library                          : x264 core 155 r2901 7d0ff22
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 5 min 37 s
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Frame rate                               : 43.066 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 5.16 MiB (21%)
Title                                    : ISO Media file produced by Google Inc. Created on: 03/22/2022.
Language                                 : English
Default                                  : Yes
Alternate group                          : 1
avni commented 11 months ago

New results from a fresh (2nd) VM — video and thumbnail completely fail to be copied over to /library/calibre-web so far:

root@box:~# tree /library/downloads/calibre-web/
/library/downloads/calibre-web/
├── Youtube
│   └── Himalayan_Trust
│       ├── Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_510.00_[ehSIhj2-qPM].mp4
│       └── Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_510.00_[ehSIhj2-qPM].webp
└── survey.db

2 directories, 3 files
root@box:~# tree /library/calibre-web/
/library/calibre-web/
├── config
│   └── app.db
├── metadata.db
└── metadata_db_prefs_backup.json

root@box:~# pastebinit -b sprunge.us /var/log/calibre-web.log http://sprunge.us/dmc04e

root@box:~# pastebinit -b sprunge.us /var/log/xklb.log http://sprunge.us/wRr8Yf

root@box:~# iiab-diagnostics ... http://sprunge.us/rgEOEU

avni commented 11 months ago

On the original server:

root@box:~# cd /library/downloads/calibre-web/Youtube/Himalayan_Trust
root@box:/library/downloads/calibre-web/Youtube/Himalayan_Trust# mediainfo *506*mp4
General
Complete name                            : Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_506.00_[ehSIhj2-qPM].mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 24.7 MiB
Duration                                 : 5 min 37 s
Overall bit rate                         : 614 kb/s
Movie name                               : Bringing technological literacy to Solukhumbu with Himalayan Trust Nepal and Edutech
Performer                                : Himalayan Trust
Description                              : The Himalayan Trust has joined Edutech Nepal and the Himalayan Trust Nepal in forming 7 new computer labs in Solukhumbu. Himalayan Trust has put forward $25,000 NZD (50% of the budget) towards these 7 labs and is looking forward to working on other projects throughout Solukhumbu alongside Edutech Nepal and HTN.  /  / Computer literacy is an essential part of any education in 2022 and is becoming a core part of fulfilling the Himalayan Trusts mission of bringing quality education to all of Solukhumbu.
Recorded date                            : 20220323
Writing application                      : Lavf58.76.100
Comment                                  : https://www.youtube.com/watch?v=ehSIhj2-qPM
LongDescription                          : The Himalayan Trust has joined Edutech Nepal and the Himalayan Trust Nepal in forming 7 new computer labs in Solukhumbu. Himalayan Trust has put forward $25,000 NZD (50% of the budget) towards these 7 labs and is looking forward to working on other projects throughout Solukhumbu alongside Edutech Nepal and HTN.  /  / Computer literacy is an essential part of any education in 2022 and is becoming a core part of fulfilling the Himalayan Trusts mission of bringing quality education to all of Solukhumbu.

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Main@L3
Format settings                          : CABAC / 3 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 3 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 5 min 37 s
Bit rate                                 : 480 kb/s
Width                                    : 640 pixels
Height                                   : 360 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 25.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.083
Stream size                              : 19.3 MiB (78%)
Title                                    : ISO Media file produced by Google Inc. Created on: 03/22/2022.
Writing library                          : x264 core 155 r2901 7d0ff22
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 5 min 37 s
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Frame rate                               : 43.066 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 5.16 MiB (21%)
Title                                    : ISO Media file produced by Google Inc. Created on: 03/22/2022.
Language                                 : English
Default                                  : Yes
Alternate group                          : 1
root@box:/library/downloads/calibre-web/Youtube/Himalayan_Trust# mediainfo *510*mp4
General
Complete name                            : Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_510.00_[ehSIhj2-qPM].mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 24.7 MiB
Duration                                 : 5 min 37 s
Overall bit rate                         : 614 kb/s
Movie name                               : Bringing technological literacy to Solukhumbu with Himalayan Trust Nepal and Edutech
Performer                                : Himalayan Trust
Description                              : The Himalayan Trust has joined Edutech Nepal and the Himalayan Trust Nepal in forming 7 new computer labs in Solukhumbu. Himalayan Trust has put forward $25,000 NZD (50% of the budget) towards these 7 labs and is looking forward to working on other projects throughout Solukhumbu alongside Edutech Nepal and HTN.  /  / Computer literacy is an essential part of any education in 2022 and is becoming a core part of fulfilling the Himalayan Trusts mission of bringing quality education to all of Solukhumbu.
Recorded date                            : 20220323
Writing application                      : Lavf58.76.100
Comment                                  : https://www.youtube.com/watch?v=ehSIhj2-qPM
LongDescription                          : The Himalayan Trust has joined Edutech Nepal and the Himalayan Trust Nepal in forming 7 new computer labs in Solukhumbu. Himalayan Trust has put forward $25,000 NZD (50% of the budget) towards these 7 labs and is looking forward to working on other projects throughout Solukhumbu alongside Edutech Nepal and HTN.  /  / Computer literacy is an essential part of any education in 2022 and is becoming a core part of fulfilling the Himalayan Trusts mission of bringing quality education to all of Solukhumbu.

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Main@L3
Format settings                          : CABAC / 3 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 3 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 5 min 37 s
Bit rate                                 : 480 kb/s
Width                                    : 640 pixels
Height                                   : 360 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 25.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.083
Stream size                              : 19.3 MiB (78%)
Title                                    : ISO Media file produced by Google Inc. Created on: 03/22/2022.
Writing library                          : x264 core 155 r2901 7d0ff22
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 5 min 37 s
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Frame rate                               : 43.066 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 5.16 MiB (21%)
Title                                    : ISO Media file produced by Google Inc. Created on: 03/22/2022.
Language                                 : English
Default                                  : Yes
Alternate group                          : 1
holta commented 11 months ago

Screenshot_20231224-213526~2

@deldesir clarify what 506 and 510 mean please, and/or why these two numbers appeared?

@avni is it possibly too late to ask you to verify that the two MP4 files are nearly identical — by sharing with us the output below?

cd /library/downloads/calibre-web/Youtube/Himalayan_Trust
ls -l
sha256sum *
EMG70 commented 11 months ago

I have just managed to download this video OK at first attempt as 480p,720p and 1080p https://www.youtube.com/watch?v=1tBczesm0Ys. I am using a VM created on 23/12/23.Do I need to upgrade just so we have same as @avni ? Screenshot from 2023-12-25 18-24-49 Screenshot from 2023-12-25 18-25-48

holta commented 11 months ago

Do I need to upgrade just so we have same as @avni ?

Please upgrade yes, just to confirm 💯% !

Instructions to upgrade any VM:

https://github.com/iiab/calibre-web/wiki#upgrading

EMG70 commented 11 months ago

All still downloading Ok after upgrading VM ,tested in all 3 pixel values. Screenshot from 2023-12-25 18-42-07

holta commented 11 months ago

Huge thanks @EMG70 for confirming:

Intermittent periods where "Download to IIAB" fails dramatically (thumbnail, copying of video files, etc) have become common in recent days — with 3 different people experiencing such — and we still don't understand why... 🤔

deldesir commented 11 months ago

@deldesir clarify what 506 and 510 mean please, and/or why these two numbers appeared?

The numbers in the video titles, such as "510.00" and "506.00," are coming from the video metadata obtained during the download process. Specifically, these numbers are part of the filename template specified in the "outtmpl" option when initializing the YoutubeDL object in https://github.com/chapmanjacobd/library/blob/525dc6b06db8648e5ec7ac2843be464a85797e4c/xklb/tube_backend.py#L202-206. Breaking down the relevant part of the code:

"outtmpl": {
    "default": str(
        Path(f"{consts.SUB_TEMP_DIR}/%(uploader,uploader_id)s/%(title).200B_[%(id).60B].%(ext)s"),
    ),
},

In this template:

These placeholders are filled in with actual values during the download, resulting in filenames like "Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech510.00[ehSIhj2-qPM].mp4" and "Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech506.00[ehSIhj2-qPM].mp4."

The dynamic nature of these numbers suggests that they are not fixed but rather generated or retrieved during the download process.

I am trying to understand the specific values, inspecting the info object obtained during the download, which holds the video metadata. This can be obtained as a json file for this specific video with:

yt-dlp --write-info-json https://www.youtube.com/watch?v=ehSIhj2-qPM
holta commented 11 months ago

I am trying to understand the specific values, inspecting the info object obtained during the download, which holds the video metadata.

Thank you: this is a critical piece to the puzzle of DIY scraping for all people in all countries.

No matter if "Download to IIAB" button is used — or if "Upload" button is used — for endangered/indigenous/grassroots content 🌱

deldesir commented 11 months ago

UPDATE: The dynamic nature of the numbers 506 and 510 reflects the actual approximate file size of the videos appended to the video title per the ~"format-sort"~ format selection at runtime. To reproduce, download another video with a shorter name.

holta commented 11 months ago

dynamic nature of the numbers 506 and 510 reflects the actual approximate file size of the videos

Are you saying size of video files is non-deterministic (regularly changing) for the very same video — downloaded in the very same way?

To reproduce, download another video with a shorter name.

Length of the video name relevant how?

(Or are you just saying video filenames are easier to read when video name/title is shorter?)

holta commented 11 months ago

dynamic nature of the numbers 506 and 510 reflects the actual approximate file size of the videos

Are you saying size of video files is non-deterministic (regularly changing) for the very same video — downloaded in the very same way?

QUICK CORRECTION: 506 and 510 were actually the "current view counts" as explained by xklb's developer:

avni commented 11 months ago

Checksum reveals the files are indeed the same:

root@box:/library/downloads/calibre-web/Youtube/Himalayan_Trust# sha256sum *.mp4
3bca0bdcf53910afd7ca205d31a72fd817857b631055c16526ac4c2e033952ec  Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_506.00_[ehSIhj2-qPM].mp4
3bca0bdcf53910afd7ca205d31a72fd817857b631055c16526ac4c2e033952ec  Bringing_technological_literacy_to_Solukhumbu_with_Himalayan_Trust_Nepal_and_Edutech_510.00_[ehSIhj2-qPM].mp4
deldesir commented 11 months ago

Thanks to you @avni and @EMG70 for your successive contributions to debugging this naming problem and to @holta for contributing to a better understanding of it. Now that we have the xklb developer's explanation, I can concentrate on a solid resolution to these intermittent failures.