Open dirkf opened 3 years ago
I face the exact same problem: I'm unable to exclude a format that has a =
in its ID. My current workaround is to exclude the prefix with !^=
, but this is far from robust.
Feel free to apply the one-line-of-code (2 comment lines) patch from the linked PR and try it out, if your yt-dl installation allows (ie, not a single executable bundle).
Thanks for quick response, I'll try out soon <3
On Fri, 24 Dec 2021 at 12:59, dirkf @.***> wrote:
Feel free to apply the one-line-of-code (2 comment lines) patch from the linked PR and try it out, if your yt-dl installation allows (ie, not a single executable bundle).
— Reply to this email directly, view it on GitHub https://github.com/ytdl-org/youtube-dl/issues/29572#issuecomment-1000810140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWVNELOQOCWUYGEUFMCYW5TUSROD7ANCNFSM5ARDOVFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Checklist
Verbose log
Show the available formats (snipped):
Now fail to exclude one:
Description
In the format selection code in
YoutubeDL.py
, the parameter on the right-hand side of a format selection string comparison (eghls
in-f best[format_id!*=hls]
) is matched with this regex(?P<value>[a-zA-Z0-9._-]+)
. However theformat_id
value is sanitised with thisre.sub(r'[\s,/+\[\]()]', '_', format_id)
.If the site provides a format ID that contains a character that is not alphanumeric or in
[._-]
and is not sanitised, a selection expression that specifically excludes the ID ([format_id!=problem_id]
) causes an exceptionValueError('Invalid filter specification...')
. For instance, the character '=' causes this, as shown: see the-F
results in this log for another site that produces such format IDs. Among other potentially problematic printable characters are"#$%&'*:;<>?@^{|}~£
as well as ` and non-ASCII alphabetic characters.Either the string value should be matched with something like
(?P<value>[^\s,/+\[\]()]+)
or theformat_id
value should be sanitised withre.sub(r'[^a-zA-Z0-9._-]', '_', format_id)
. At least, the=
character should be added to the string value match.In principle this problem might affect other string-valued format selection fields (
ext
,acodec
,vcodec
,container
,protocol
,language
) but these are unlikely to contain problematic characters.Still isn't labelled as a bug.