pystardust / ytfzf

A posix script to find and watch youtube videos from the terminal. (Without API)
GNU General Public License v3.0
3.77k stars 345 forks source link

[BUG]: Thumbnails not downloaded, curl error 404, when scraping subscriptions from a file #669

Open kevenwyld opened 1 year ago

kevenwyld commented 1 year ago

Describe the bug

When running ytfzf -t -f -c SI thumbnails are not downloaded and some curl 404 errors go to stderr:

] > ytfzf --thumbnail-log=log.txt -t -f -c SI
Scraping subscriptions with instance: https://invidious.esmailelbob.xyz
DL% UL%  Dled  Uled  Xfers  Live Total     Current  Left    Speed
--  --  1150k     0    18     0  --:--:--  0:00:05 --:--:--  212k
Fetching thumbnails...
DL% UL%  Dled  Uled  Xfers  Live Total     Current  Left    Speed
--  --      0     0    36    36  --:--:--  0:00:03 --:--:--     0      curl: (22) The requested URL returned error: 404
--  --      0     0    36    35  --:--:--  0:00:06 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36    32  --:--:--  0:00:07 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36    17  --:--:--  0:00:09 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36    15  --:--:--  0:00:11 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36    11  --:--:--  0:00:11 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36     6  --:--:--  0:00:12 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
--  --      0     0    36     4  --:--:--  0:00:13 --:--:--     0      curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
  0 --      0     0    36     0  --:--:--  0:00:14 --:--:--     0

To Reproduce

run ytfzf -t -f -c SI with the following subscriptions file:

https://www.youtube.com/channel/UC4PIO2pZaFKzI97uumFTNSg/videos # OficineRobotica
https://www.youtube.com/channel/UCeKpbMimEGgLM_0tnghfoVw/videos # Clough42
https://www.youtube.com/channel/UC7pokUsRb6q2B0FOzSqQLlw/videos # Adventures in creation
https://www.youtube.com/channel/UCw3UZn1tcVe7pH3R6C3Gcng/videos # Abom79
https://www.youtube.com/channel/UCCkSr3M8GXbS4txqPY7OMxQ/videos # Edge Precision
https://www.youtube.com/channel/UC7Jf7t6BL4e74O53dL6arSw/videos # Blondihacks
https://www.youtube.com/channel/UCY8gSLTqvs38bR9X061jFWw/videos # Stefan Gotteswinter
https://www.youtube.com/channel/UC-CubOaooNwC-3RBKUoAOQQ/videos # Joko Engineeringhelp
https://www.youtube.com/channel/UC7aAyIrjeH2RKciAXzdOaJA/videos # Artisan Makes
https://www.youtube.com/channel/UCKLIIdKEpjAnn8E76KP7sQg/videos # mrpete222
https://www.youtube.com/channel/UCyjwQ6oz4cqqtEcWGboSU3g/videos # Keith Rucker - VintageMachinery.org
https://www.youtube.com/channel/UChIs72whgZI9w6d6FhwGGHA/videos # Gamers Nexus
https://www.youtube.com/channel/UCVI8Mfisni3GaobL1e2JOIQ/videos # Inheritance Machining
https://www.youtube.com/channel/UC2wdo5vU7bPBNzyC2nnwmNQ/videos # Cutting Edge Engineering Australia
https://www.youtube.com/channel/UCworsKCR-Sx6R6-BnIjS2MA/videos # Clickspring
https://www.youtube.com/channel/UC9UjDtkpr2I-5G51vMJZvnA/videos # ClickspringClips
https://www.youtube.com/channel/UCiDJtJKMICpb9B1qf7qjEOA/videos # Adam Savage’s Tested
https://www.youtube.com/channel/UCB0wPMJJ2FKqdB-gx7YVsDg/videos # Matty’s Workshop

Expected behavior

Thumbnails similar to those displayed when using the invidious-channel feature

Screenshots

Screenshot_2023-04-04_10-51-34

Information

Additional context

I did some testing using bash -x to get debug output. Here's download log output from a working invidious-channel scrape:

+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/i6WIRWdGUPg/hqdefault.jpg i6WIRWdGUPg
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/J2zZhThFurg/hqdefault.jpg J2zZhThFurg
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/B4u8MpH9db8/hqdefault.jpg B4u8MpH9db8
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/nIDQzpBLqFo/hqdefault.jpg nIDQzpBLqFo
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/0LbDxvvA8Ww/hqdefault.jpg 0LbDxvvA8Ww
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/kQOF9cB7Gjw/hqdefault.jpg kQOF9cB7Gjw
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/https:__www.youtube.com_channel_UCB0wPMJJ2FKqdB-gx7YVsDg_videos # Matty’s Workshop-1102122/thumbnails/%s.jpg"\n' https://iv.melmac.space/vi/RlHteM78lDo/hqdefault.jpg RlHteM78lDo

and here's one from a not working -cSI scrape against my subscriptions file:

+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/eCDW3Xm_voE/high.jpg eCDW3Xm_voE
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/x6LUpi6W3YA/high.jpg x6LUpi6W3YA
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/jX9jzSfVrUA/high.jpg jX9jzSfVrUA
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/1qtg1z5V1ss/high.jpg 1qtg1z5V1ss
+ for line in "$@"
+ printf 'url="%s"\noutput="/tmp/ytfzf-1000/SCRAPE-SI-1100859/thumbnails/%s.jpg"\n' https://invidious.baczek.me/vi/AXdFQga0i88/high.jpg AXdFQga0i88
+ curl -fLZ -K /tmp/ytfzf-1000/SCRAPE-SI-1100859/tmp/curl_config

I tried downloading the image from both. https://invidious.baczek.me/vi/1qtg1z5V1ss/hqdefault.jpg contains an image while https://invidious.baczek.me/vi/1qtg1z5V1ss/high.jpg does not. Though I cant figure out why the two types of scrapes request different quality images. I think this may not be related though since neither URL is a 404. I hope this is helpful though.

This is the only place thumbnails are broken for me. They work with all other searches and scrapes.

Thanks!

Euro20179 commented 1 year ago

You could try using --thumbnail-quality=hqdefault, however using high works for me.

kevenwyld commented 1 year ago

Could it be that this was a misunderstanding of the supported thumbnail types in invidious? There is no high url, but the name of the high thumbnail is hqdefault here: https://github.com/iv-org/invidious/blob/6837e4292829ee0891c73108096b806b63ab1506/src/invidious/videos.cr#L425

I've tried every instance I can find and none of them return anything for https:///vi/AKZRuNZDkGU/high.jpg but they all return an image for https:///vi/AKZRuNZDkGU/hqdefault.jpg

This makes me think the default quality should be hqdefault instead of high. But I'll gladly admit that I don't have a complete understanding of this codebase and could be completely wrong =] .

Also I can reproduce this very consistently with ytfzf --thumbnail-quality=high -t -f -c SI

Euro20179 commented 1 year ago

tbh, high works for me 99% of the time, if this becomes a bigger issue i'll change the default to hqdefault. In the meantime, i'd suggest added thumbnail_quality=hqdefault to your config file.

edit: Im kinda dumb, I didn't realize this only really affects subscriptions for some reason, and when scraping SI this bug appears a lot more often for me.

kevenwyld commented 1 year ago

I think it's because scrape_SI... or maybe scrape_subscriptions doesn't call _get_invidious_thumb_quality_name but the other functions like scrape_invidious_playlist do?

You are converting high to hqdefault in that function but without it the thumbnail_quality variable is just high which is what's being passed to invidious in the url as far as I can tell.

_get_invidious_thumb_quality_name () {
    case "$thumbnail_quality" in
        high) thumbnail_quality="hqdefault" ;;
        medium) thumbnail_quality="mqdefault" ;;
        start) thumbnail_quality="1" ;;
        middle) thumbnail_quality="2" ;;
        end) thumbnail_quality="3" ;;
    esac
}

PS. I have no idea how you stay organized in a 3565 line long file.... And also sorry if I'm way off here.

EDIT: I tried adding _get_invidious_thumb_quality_name to the scrape_SI function and it seems to have fixed it. Though not sure if that's the best solution.

Euro20179 commented 1 year ago

I have no idea how you stay organized in a 3565 line long file

Its hard lol.

I think it's because scrape_SI... or maybe scrape_subscriptions doesn't call _get_invidious_thumb_quality_name

I think you're right. I will add this patch when I get home. I believe adding the function call is the best solution, but it might be better if it gets called automatically somewhere.

Euro20179 commented 1 year ago

This should now be fixed in the development branch.

kevenwyld commented 1 year ago

Thanks! Just tested and it's working great now.