Closed mgabor3141 closed 2 weeks ago
Hey there! Thanks for the report (:
This seems like a relatively harmless change so I'm open to adding it! I have some setup docs here and I'd appreciate it if you gave it a shot to see if it works (since I don't have a Nebula account).
I suspect the change itself would be pretty easy. I think you'd just have to modify the response_to_struct
method in lib/pinchflat/yt_dlp/media.ex
to remove the anchor - there are already a few examples of parsing the response in that method in case that helps. Untested, but I suspect all you'd have to do is something like this:
old_uri = URI.parse(...)
new_uri = %URI{ old_uri | fragment: nil }
URI.to_string(new_uri)
Let me know if that works out!
Also, I'm happy to help with the Elixir side of things at any time! Feel free to ask questions. I love Elixir, but I get that it's not a super common language to know (:
I'll leave this for someone who's ready to take it on! In the meantime, I'm going to close this issue 🤙
I was originally going to comment this under #177 but an issue is better for discussion.
Nebula almost works, it just has an issue retrieving metadata fields from videos. I have done some investigation and the issue seems to be the following. (And also very simple to fix if I'm not mistaken.)
Problem
The Nebula extractor uses the
#__youtubedl_smuggle
URL parameter to tell the video download step that it doesn't need to fetch the metadata, because we already have it from the playlist metadata step. CodeThis parameter gets saved into the Pinchflat video database entry with the rest of the URL:
This means that when Pinchflat starts the job to download this individual URL we skip fetching any metadata:
Note that the title field is wrong, all dates are null, many fields are missing which would normally be present, etc. This impacts the output filenames, and changing the output format doesn't help either. This data is missing at this point in the process.
I have tried to nail down that this is in fact the root cause by running
yt-dlp
with the-j
parameter to see the resulting json fields. I ran it on both the playlist URL, the video URL, as well as the video URL with the#__youtubedl_smuggle
parameter, and I got the incorrect metadata only in the last case. I have also confirmed through the--verbose
flag that whenyt-dlp
downloads the playlist it also fetches the individual videos with this smuggle parameter, but since it's in the process of downloading the playlist it's not a problem then.Possible solution
Get rid of this smuggle parameter.
One option is to strip it here, but preferably it should be stripped when the video URL is inserted into the database.
The easy option is to just throw away the
#
and everything after it, but ideally only this parameter would be removed as there could be cases where other#
params are important. Not with YouTube though as far as I know.Possible solution where an upstream fix is needed
yt-dlp
has a a field in the output metadata that's documented as follows:In an ideal scenario this would already contain the video URL with the smuggle parameter removed. Unfortunately this is not the case, so
yt-dlp
fails to deliver on this "same result" promise.In any case I'd be happy to contribute a fix, but I've never done Elixir development before. I do have some experience with functional programming, so I might be able to put together a simple change, but I would definitely like some direction first.
Would a change like this be desired/acceptable in this project as it's not needed for YouTube functionality?