miniflux / v2

Minimalist and opinionated feed reader
https://miniflux.app
Apache License 2.0
6.97k stars 728 forks source link

Youtube not showing correct watch time #2807

Open sarmong opened 3 months ago

sarmong commented 3 months ago

Instead, every video shows 1 minute read

Apparently there has been some changes to Youtube, because invidious is also experiencing problems.

fguillot commented 3 months ago

YouTube did not rolled out this change globally yet. This is working fine for me at the moment. Will have to wait to see what is the actual change on YouTube.

fin444 commented 2 months ago

Miniflux looks for a <meta> object on the page with itemprop="duration", however this is missing in some rollouts of their interface. This seems to correspond to the new login prompt, however I'm not entirely sure because I don't have a google account. With a bit of casual searching through the page contents, I've been unable to find a replacement data source.

agrmohit commented 1 week ago

Youtube has started showing duration in ISO 8601 duration notation which miniflux is not able to parse. The format is also documented here https://tc39.es/proposal-temporal/docs/duration.html

As an example the following video https://www.youtube.com/watch?v=E55uSCO5D2w contains this element <meta itemprop="duration" content="PT13M56S">

fin444 commented 1 week ago

The existing behavior is to parse ISO 8601 from a <meta itemprop="duration"> tag, unless I am mistaken.

https://github.com/miniflux/v2/blob/6eb1f25a53ea00b5a77b9d53dad67f6237c4a36b/internal/reader/processor/youtube.go#L53-L61

agrmohit commented 1 week ago

I missed the regex part, that makes it even more weird that it fails to parse them correctly most of the time for me.

fin444 commented 1 week ago

It's due to Youtube's ongoing campaign against scraping/downloading/using the site without an account. Parsing will fail if it flags your IP, as the meta tag will be missing.

sarmong commented 3 days ago

Yes, I downloaded the html page the way youtube returns it to my VPS and on top of the video there is this. (Which doesn't happen if I download the html from my PC).

image

Any ways to work around it?

fin444 commented 3 days ago

Any ways to work around it?

There are a lot of people who would like to know the answer to that question.

sarmong commented 3 days ago

Looking at yt-dlp and invidious issues regarding the same problems, seems like the only way around is somehow signing in and using tokens. I doubt that youtube will unblock DC IPs.

One workaround that I thought of for miniflux is that video html may be downloaded on the frontend, scraped for duration and it being sent to backend. I assume it will be very much against the app architecture, so doubt that the maintainers will approve such PR.

telnet23 commented 3 days ago

I am encountering the same issue where the meta element is missing from the YouTube website, so I opened PR #2951 to optionally fetch the watch time from the YouTube API instead of the website.