Closed danielatlarge closed 5 years ago
@danielatlarge
At https://www.bbc.com/bitesize/articles/zqghtyc
exists only one video clip (my method gets trickier when multiple clips reside in the same web page).
"pid":
; its second instance is found inside a JSON block like below:
"headers":{"content-type":"application\/json"}},"body":{"type":"video-block","id":"zkf8mfr","title":"","caption":"","pid":"p03q2xx3","transcript":"","video":{"duration":"PT41S","holdingImage":"https:\/\/ichef.bbci.co.uk\/images\/ic\/$recipe\/p03q2xwb.jpg","mediaType":"video","title":"How to use the suffix -ly","vpid":"p05f425d"}
p03q2xx3
, THIS IS THE PID OF THE VIDEO YOU NEED!
You can view clip details at https://www.bbc.co.uk/programmes/p03q2xx3.json
programmes
template, i.e.
https://www.bbc.co.uk/programmes/p03q2xx3
; this video clip is geo-fenced, accessible from only whitelisted UK IPs; I am located overseas, so youtube-dl -F https://www.bbc.co.uk/programmes/p03q2xx3
=>
[bbc.co.uk] p03q2xx3: Downloading video page
[bbc.co.uk] p03q2xx3: Downloading playlist JSON
[bbc.co.uk] p05f425d: Downloading media selection XML
[bbc.co.uk] p05f425d: Downloading media selection XML
ERROR: bbc.co.uk returned error: geolocation
... but with a whitelisted UK HTTP proxy, things are much better 😜 :
youtube-dl --proxy="http://proxyhost:proxyport" --console-title --hls-prefer-native -c --no-part -f "stream-uk-iptv_streaming_concrete_combined_sd_mf_akamai_uk_hls-1836" "https://www.bbc.co.uk/programmes/p03q2xx3" -o "How to use the suffix -ly[p03q2xx3].mp4" --write-sub --convert-subs=srt --embed-subs --write-thumbnail --embed-thumbnail --add-metadata
=>
[bbc.co.uk] p03q2xx3: Downloading video page
[bbc.co.uk] p03q2xx3: Downloading playlist JSON
[bbc.co.uk] p05f425d: Downloading media selection XML
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading MPD manifest
[bbc.co.uk] p05f425d: Downloading MPD manifest
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading MPD manifest
[bbc.co.uk] p05f425d: Downloading MPD manifest
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading captions
[bbc.co.uk] p05f425d: Downloading captions
[bbc.co.uk] p05f425d: Downloading captions
[info] Writing video subtitles to: How to use the suffix -ly[p03q2xx3].en.ttml
[bbc.co.uk] p05f425d: Downloading thumbnail ...
[bbc.co.uk] p05f425d: Writing thumbnail to: How to use the suffix -ly[p03q2xx3].
jpg
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 6
[download] Destination: How to use the suffix -ly[p03q2xx3].mp4
[download] 100% of 8.17MiB in 00:15
[ffmpeg] Fixing malformed AAC bitstream in "How to use the suffix -ly[p03q2xx3].
mp4"
[ffmpeg] Adding metadata to 'How to use the suffix -ly[p03q2xx3].mp4'
[ffmpeg] Converting subtitles
WARNING: You have requested to convert dfxp (TTML) subtitles into another format
, which results in style information loss
Deleting original file How to use the suffix -ly[p03q2xx3].en.ttml (pass -k to k
eep)
[ffmpeg] Embedding subtitles in 'How to use the suffix -ly[p03q2xx3].mp4'
Deleting original file How to use the suffix -ly[p03q2xx3].en.srt (pass -k to ke
ep)
[atomicparsley] Adding thumbnail to "How to use the suffix -ly[p03q2xx3].mp4"
Teach well! 😃
Wow. Thanks a mil, Vangelis66! I had very little hope that someone would write. So glad you did. Now that I've got the video downloaded, I hope my students will be the better for it too!
I've got a vpn that I use but would much rather use an http proxy since it's cumbersome to find a server that isn't blocked. Will look for a whitelisted http proxy instead of having to manually connect to random servers all the time.
Teaching is brutal. I hope you're not in the profession. THanks again, mate! ;-)
Thanks a mil, Vangelis66! I had very little hope that someone would write. So glad you did
Err, you're quite welcome! I had received much help from strangers back in the day (mid 2000s) when I was clueless, but at a time when things on the internet were more civil and altruistic; so I still like to give back to others... Sadly, now everything's monetised and everyone likes to keep things for themselves (not without good reason, in some cases...).
I've got a vpn that I use but would much rather use an http proxy since it's cumbersome to find a server that isn't blocked. Will look for a whitelisted http proxy instead of having to manually connect to random servers all the time.
The beeb have been relentless over, at least, the past two years at hunting down and blocking all commercial and free geo-location circumvention methods 😞 ; free and paid-for UK proxies are in the same boat as VPNs and SmartDNS services, i.e. being constantly blacklisted...
Teaching is brutal. I hope you're not in the profession.
... Sort of, but in the past; had been practising private Chemistry tutoring for Uni students in my late-20s - mid-30s, so to adults, not toddlers...
Returning on topic, ideally a BBC bitesize plugin could be created that would web scrape clip PIDs and then use the bbc plugin's logic to fetch to disk, but the devs are swamped with so many support requests that I won't hold my breath for such a plugin anytime soon... In all honesty, I think you had better close this issue...
@danielatlarge @Vangelis66 here is some code that will help you fetch the corresponding PIDs and will system execute youtube-dl.
Might be helpful for someone else, python based.
import requests
import os
sites = [
'https://www.bbc.co.uk/bitesize/guides/zws8h39/video'
]
marker = '"chapterData"'
point_of_pid = 22
end_of_pid = 8
base = "https://www.bbc.co.uk/programmes/"
for url in sites:
r = requests.get(url)
page_source = r.text
page_source = page_source.split('\n')
print("\nURL:", url)
print("--------------------------------------")
for row in page_source:
if marker in row:
entry = row.find(marker)
print (entry)
print ('----')
begin = entry + point_of_pid
end = begin + end_of_pid
pid = row[begin:end] + ".json"
print ( pid )
url = base + pid
print (" Downloading from " + url )
cmd = "youtube-dl " + url
os.system(cmd)
print("--------------------------------------")
Hello, sorry to post on an old issue, following @Vangelis66 instructions i was able to download the video from the example.
But i am not having any luck with the following site: https://www.bbc.co.uk/bitesize/topics/z6882hv/articles/zfm84xs
Can anyone offer advice as to how i can download the how to spot mammals video
Thank you
@chonnymon
... The BBC bitesize
page structure has changed considerably over the 4 1/2 yrs (and more 😉 ) since I posted my "guide" above ...
If you want to go the Page Source
route like I did in my original guide, the logic remains similar: retrieve the pid
(or vpid
) string of the video clip you wish to fetch to disk; then, use that found string inside the mentioned www.bbc.co.uk/programmes
API and pass the resulting URI to yt-dl
(or similar tool) ...
In your browser of choice, load the Page Source
of
https://www.bbc.co.uk/bitesize/topics/z6882hv/articles/zfm84xs
The page contains two video clips; search in the source for vpid
and you'll end up with (L24):
\"vpid\":\"p04zr5yp\",
...
\"vpid\":\"p02p6ndv\",
The first value is for the clip Watch: How to spot mammals
, the second for clip Watch: Different kinds of mammals
; technically, you need the PID string for each clip (not the versionPID=vpid
one) that the Beeb no longer provide in plain text 😡 , but:
https://www.bbc.co.uk/programmes/p04zr5yp => https://www.bbc.co.uk/programmes/p04zr5yh
-> vpid=p04zr5yp
=> pid=p04zr5yh
pid
string:
{
"version": {
"canonical": 1,
"pid": "p04zr5yp",
"duration": 60,
"parent": {
"programme": {
"type": "clip",
"pid": "p04zr5yh",
"title": "Bitesize Primary KS1 Mammals"
}
},
"types": [
"Original version"
],
"contributors": [ ],
"segment_events": [ ],
"broadcasts": [ ],
"availabilities": [ ]
}
}
Finally 😄 :
PS: The URI kindly posted by 3052 😉 above uses the mediaselector/6
API and it can be URL-sniffed by your browser when you reload the webpage and start playback of the clip in question - this same (or similar) URI is used internally by yt-dl
when you use the global method I described above; that URI can't be used as-is by yt-dl
; if it wasn't obvious to you, you have to load it in your browser and from the JSON output select one M3U8 or MPD manifest and feed that to yt-dl
; downside is you lose the metadata and meaningful filename the pid
method affords:
yt-dl -f best "https://www.bbc.co.uk/programmes/p04zr5yh" --add-metadata =>
[bbc.co.uk] p04zr5yh: Downloading video page
[bbc.co.uk] p04zr5yh: Extracting from playlist JSON
[bbc.co.uk] p04zr5yh: Downloading playlist JSON
[bbc.co.uk] p04zr5yp: Downloading media selection JSON
[bbc.co.uk] p04zr5yp: Downloading m3u8 information
[bbc.co.uk] p04zr5yp: Downloading m3u8 information
[bbc.co.uk] p04zr5yp: Downloading MPD manifest
[bbc.co.uk] p04zr5yp: Downloading MPD manifest
[bbc.co.uk] p04zr5yp: Downloading m3u8 information
[bbc.co.uk] p04zr5yp: Downloading m3u8 information
[bbc.co.uk] p04zr5yp: Downloading MPD manifest
[bbc.co.uk] p04zr5yp: Downloading MPD manifest
[bbc.co.uk] p04zr5yp: Downloading media selection JSON
[bbc.co.uk] p04zr5yp: Downloading m3u8 information
[bbc.co.uk] p04zr5yp: Downloading m3u8 information
[bbc.co.uk] p04zr5yp: Downloading m3u8 information
[bbc.co.uk] p04zr5yp: Downloading m3u8 information
[bbc.co.uk] p04zr5yp: Downloading MPD manifest
[bbc.co.uk] p04zr5yp: Downloading MPD manifest
[bbc.co.uk] p04zr5yp: Downloading MPD manifest
[bbc.co.uk] p04zr5yp: Downloading MPD manifest
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 8
[download] Destination: Teach, Bitesize Primary KS1 Mammals-p04zr5yp.mp4
[download] 100% of 7.91MiB in 00:09
[ffmpeg] Fixing malformed AAC bitstream in "Teach, Bitesize Primary KS1 Mammals-
p04zr5yp.mp4"
[ffmpeg] Adding metadata to 'Teach, Bitesize Primary KS1 Mammals-p04zr5yp.mp4'
@dirkf :
While researching this, I discovered that thumbnail downloading/embedding appears broken for this pid
:
yt-dl --skip-download "https://www.bbc.co.uk/programmes/p04zr5yh" --write-thumbnail =>
....
[bbc.co.uk] p04zr5yp: Downloading thumbnail ...
WARNING: Unable to download thumbnail "http://ichef.bbci.co.uk/images/ic/$recipe/p04zsb2m.jpg": HTTP Error 403: Forbidden
Any insight, please? 😃 According to my browser, $recipe
=widthxn, e.g.:
https://ichef.bbci.co.uk/images/ic/896xn/p04zsb2m.jpg
My most sincere and genuine apologies, I wasn't aware I did something wrong (on the contrary, I tried to be friendly) - my post will be edited accordingly and your name won't ever be mentioned again - sorry ...
FWIW: English is NOT my mother tongue, so I had to search for "doxing"; I can assure you and all others I had no "malicious intent" whatsoever (that's what Google implies of in their explanation of the term) - this is all an unfortunate misunderstanding; my only motive is to help others, not get into trouble or cause trouble to others - has always been that way...
I expect that the extractor needs to know that $recipe
should be replaced, and also that there is a version queued up for testing that might do so.
I'm trying to download videos from BBC Bitesize. An example url would be https://www.bbc.com/bitesize/articles/zqghtyc An actual PID for a video would be... p05f425d I have tried youtube-dl -v "https://www.bbc.com/bitesize/articles/zqghtyc and https://www.bbc.co.uk/programmes/p05f425d and https://www.bbc.co.uk/programmes/zqghtyc
None of which work. Would love to get some help! Thank you so much.
youtube-dl -v "https://www.bbc.com/bitesize/articles/zqghtyc" [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'-v', u'https://www.bbc.com/bitesize/articles/zqghtyc'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2019.01.30.1 [debug] Python version 2.7.15rc1 (CPython) - Linux-4.15.0-45-generic-x86_64-with-Ubuntu-18.04-bionic [debug] exe versions: ffmpeg 3.4.4, ffprobe 3.4.4 [debug] Proxy map: {} [bbc] zqghtyc: Downloading webpage ERROR: no suitable InfoExtractor for URL https://www.bbc.co.uk/programmes/None File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/local/bin/youtube-dl/main.py", line 19, in
youtube_dl.main()
File "/usr/local/bin/youtube-dl/youtube_dl/init.py", line 472, in main
_real_main(argv)
File "/usr/local/bin/youtube-dl/youtube_dl/init.py", line 462, in _real_main
retcode = ydl.download(all_urls)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2005, in download
url, force_generic_extractor=self.params.get('force_generic_extractor', False))
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 804, in extract_info
return self.process_ie_result(ie_result, download, extra_info)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 865, in process_ie_result
extra_info=extra_info)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 827, in extract_info
self.report_error('no suitable InfoExtractor for URL %s' % url)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 621, in report_error
self.trouble(error_message, tb)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 583, in trouble
tb_data = traceback.format_list(traceback.extract_stack())
youtube-dl -v "https://www.bbc.co.uk/programmes/p05f47t4" [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'-v', u'https://www.bbc.co.uk/programmes/p05f47t4'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2019.01.30.1 [debug] Python version 2.7.15rc1 (CPython) - Linux-4.15.0-45-generic-x86_64-with-Ubuntu-18.04-bionic [debug] exe versions: ffmpeg 3.4.4, ffprobe 3.4.4 [debug] Proxy map: {} [bbc.co.uk] p05f47t4: Downloading video page [bbc.co.uk] p05f47t4: Downloading playlist JSON [bbc.co.uk] p05f47t4: Downloading legacy playlist XML ERROR: Unable to download XML: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 605, in _request_webpage return self._downloader.urlopen(url_or_request) File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2215, in urlopen return self._opener.open(req, timeout=self._socket_timeout) File "/usr/lib/python2.7/urllib2.py", line 435, in open response = meth(req, response) File "/usr/lib/python2.7/urllib2.py", line 548, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python2.7/urllib2.py", line 473, in error return self._call_chain(args) File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain result = func(args) File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
Please follow the guide below
x
into all the boxes [ ] relevant to your issue (like this:[x]
)Make sure you are using the latest version: run
youtube-dl --version
and ensure your version is 2019.01.30.1. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.Before submitting an issue make sure you have:
What is the purpose of your issue?