kieraneglin / pinchflat

Your next YouTube media manager
GNU Affero General Public License v3.0
904 stars 18 forks source link

Nebula almost works, it only needs a small change to video URLs #381

Closed mgabor3141 closed 2 weeks ago

mgabor3141 commented 4 weeks ago

I was originally going to comment this under #177 but an issue is better for discussion.

Nebula almost works, it just has an issue retrieving metadata fields from videos. I have done some investigation and the issue seems to be the following. (And also very simple to fix if I'm not mistaken.)

Problem

The Nebula extractor uses the #__youtubedl_smuggle URL parameter to tell the video download step that it doesn't need to fetch the metadata, because we already have it from the playlist metadata step. Code

This parameter gets saved into the Pinchflat video database entry with the rest of the URL:

image

This means that when Pinchflat starts the job to download this individual URL we skip fetching any metadata:

image

Note that the title field is wrong, all dates are null, many fields are missing which would normally be present, etc. This impacts the output filenames, and changing the output format doesn't help either. This data is missing at this point in the process.

image

I have tried to nail down that this is in fact the root cause by running yt-dlp with the -j parameter to see the resulting json fields. I ran it on both the playlist URL, the video URL, as well as the video URL with the #__youtubedl_smuggle parameter, and I got the incorrect metadata only in the last case. I have also confirmed through the --verbose flag that when yt-dlp downloads the playlist it also fetches the individual videos with this smuggle parameter, but since it's in the process of downloading the playlist it's not a problem then.

[download] Downloading item 6 of 6
[nebula:video] Extracting URL: https://nebula.tv/videos/getaway-trailer/#__youtubedl_smuggle=%7B%22id%22%3A+%22video_episode%3A163cc414-0d92-4753-a7d9-b6a3981a6033%22%7D

Possible solution

Get rid of this smuggle parameter.

One option is to strip it here, but preferably it should be stripped when the video URL is inserted into the database.

The easy option is to just throw away the # and everything after it, but ideally only this parameter would be removed as there could be cases where other # params are important. Not with YouTube though as far as I know.

Possible solution where an upstream fix is needed

yt-dlp has a a field in the output metadata that's documented as follows:

  • webpage_url (string): A URL to the video webpage which, if given to yt-dlp, should yield the same result again

In an ideal scenario this would already contain the video URL with the smuggle parameter removed. Unfortunately this is not the case, so yt-dlp fails to deliver on this "same result" promise.

image


In any case I'd be happy to contribute a fix, but I've never done Elixir development before. I do have some experience with functional programming, so I might be able to put together a simple change, but I would definitely like some direction first.

Would a change like this be desired/acceptable in this project as it's not needed for YouTube functionality?

kieraneglin commented 4 weeks ago

Hey there! Thanks for the report (:

This seems like a relatively harmless change so I'm open to adding it! I have some setup docs here and I'd appreciate it if you gave it a shot to see if it works (since I don't have a Nebula account).

I suspect the change itself would be pretty easy. I think you'd just have to modify the response_to_struct method in lib/pinchflat/yt_dlp/media.ex to remove the anchor - there are already a few examples of parsing the response in that method in case that helps. Untested, but I suspect all you'd have to do is something like this:

old_uri = URI.parse(...)
new_uri = %URI{ old_uri | fragment: nil }

URI.to_string(new_uri)

Let me know if that works out!

kieraneglin commented 4 weeks ago

Also, I'm happy to help with the Elixir side of things at any time! Feel free to ask questions. I love Elixir, but I get that it's not a super common language to know (:

kieraneglin commented 2 weeks ago

I'll leave this for someone who's ready to take it on! In the meantime, I'm going to close this issue 🤙