miniflux / v2

Minimalist and opinionated feed reader
https://miniflux.app
Apache License 2.0
6.44k stars 702 forks source link

Improve YouTube page feed detection #2694

Closed ztec closed 2 weeks ago

ztec commented 2 weeks ago

In order to be more resilient to YouTube URLs variation and to address this feature_request: https://github.com/miniflux/v2/issues/2628 I've reworked a bit the way the YouTube feed extraction is done.

I've kept all the FindSubscriptionsFromYouTube* in order to keep all the existing unit tests as-is ensuring little to no regressions. By doing so, I had to call twice youtubeURLIDExtractor. Small performance penalty for peace of mind in my opinion.

youtubeURLIDExtractor is made in a way only one kind of page can be detected at a time. This mean I can solve the "video in a playlist" feature_request by prioritizing the playlist ID over the Video ID

Also, by using url.Parse() to get ids, it's safer to url mangle and variation. The most common variation being the t=42 parameters that start the playback at a given position. Previously, this kind of url would not be detected as "YouTube URL".

I deliberately ignored the url parsing error to keep previous behavior (skip the YouTube analysis and follow with the other analysis)

I also tried to keep debug logs the same as before as much as I could.

I manually tested all the YouTube cases (video,channel,playlist) and they all work as expected except for the video. But this one does not work either on main. The meta html tag that was searched for does not seem to exist anymore.

fix: #2628

Do you follow the guidelines?