Open jollino opened 7 years ago
This has been annoying me for quite some time as well. The Facebook users (i.e. content providers) give their videos actual titles as captions, and youtube-dl extracts the text "below".
Another random example: https://www.facebook.com/markberubemusic/videos/vb.350276931674911/315046592680474/?type=2&theater
The caption says:
This week in Switzerland...
13.12.18 - Basel - Kaserne / 15.12.18 - Bern - Dachstock * opening for Sophie Hunger
youtube-dl extracts:
13.12.18 - Basel - Kaserne / 15.12.18 - Bern - Dachstock * opening for Sophie...
While I actually interpret the title as such:
This week in Switzerland...
I can give tons of other examples.
I found this "proper" title in Facebook's current HTML code within this construct:
<title id="pageTitle">... | Facebook</title>
I'll post a pull request adding this regex to youtube-dl's title extraction. Its result is titles much more according to my expectations. It does change almost every title (including all the tests). It also changes group videos' titles from "[group name] has 481 members" to "[group name] Public Group". I don't see this side effect as all too bad though, either.
If you're impatient before I manage to post the pull request: Here's the regex:
r'(?s)<title id="pageTitle"[^>]*>([^<]*)(?: \| Facebook)</title>'
or the code:
if not video_title:
video_title = self._html_search_regex(
r'(?s)<title id="pageTitle"[^>]*>([^<]*)(?: \| Facebook)</title>',
webpage, 'title', default=None)
Please follow the guide below
x
into all the boxes [ ] relevant to your issue (like this:[x]
)Make sure you are using the latest version: run
youtube-dl --version
and ensure your version is 2017.09.02. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.Before submitting an issue make sure you have:
What is the purpose of your issue?
The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue
If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:
Add the
-v
flag to your command line you run youtube-dl with (youtube-dl -v <your command line>
), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):Description of your issue, suggested solution and other information
It looks like youtube-dl is unable to get the title of a video, and always uses the description as a title. It's actually somewhat rare for videos to have an actual title, but it would be useful for youtube-dl to use it when it is available.
For reference I'm using https://www.facebook.com/cclarinascita/videos/vl.1382797855382682/415442948635545/ but it seems to happen consistently.
The problem, I believe, is that the actual title is only shown if the video if opened from a video list page, not if the url is accessed directly, so it's just hard to spot. Most videos, moreover, don't even have one; the first part of the description is effectively used as the title.
Compare for instance for opening the aforementioned video url versus opening it from https://www.facebook.com/cclarinascita/videos/ (1st video of the 3rd playlist from the top, named "Camera Café - I stagione - Ep. 0-49").
At least on my end, the direct url just opens a simple video page showing no title, whereas opening it from the list shows a different interface with a bigger video on the left, and a right sidebar with the title in black, the description, and more videos from the same playlist ("Up next").
I ran the --write-info-json option, and it really looks like the title is not found at all:
I dug a bit into the html of https://www.facebook.com/cclarinascita/videos/vl.1382797855382682/415442948635545 (which, again, does not show the title if it's accessed directly) and discovered that the title is actually injected into the page, but it's only found inside two commented links to the video itself, in two different parts of the page:
<a class="_2za_" href="https://www.facebook.com/cclarinascita/videos"><span class="_50f7">Camera Café - I stagione - Ep. 0 - Sketch Promo</span></a>
and
<a data-onclick="[["TahoeController","openFromVideoLinkHelper",{"__elem":1},"unknown"]]" class="async_saving _400z _2-40 _5pcq" href="/cclarinascita/videos/415442948635545/" aria-label="Camera Café - I stagione - Ep. 0 - Sketch Promo" data-video-channel-id="387437888106301:415442948635545" data-channel-caller="channel_view_from_unknown" ajaxify="#" rel="async" target="">
I'm not sure how reliable this would be given that it's commented out (would a parser even be able to access it, by default?), but I suppose it's the only silver lining here.
Also, it appears that changing the ?type= parameter in the query string has absolutely no effect; the same is for removing the playlist part of the url (the intermediate /vl.1382797855382682/ in this case).