kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.38k stars 627 forks source link

youtube_dl doesn't seem to do anything #411

Open hmijail opened 3 years ago

hmijail commented 3 years ago

I can successfully use get_posts() by providing cookies and an URL, and if I don't add the youtube_dl parameter, the result contains a video URL that can successfully be used with the youtube-dl tool to download a video.

However, if I add youtube_dl=True to get_posts(), the result contains video: None.

Python 3.9.6 facebook-scraper 0.2.45 youtube-dl 2021.06.06 macOS 11.5.1

neon-ninja commented 3 years ago

For me, if I run:

set_cookies("cookies.txt")
print(next(get_posts(post_urls=[4329977513753335], youtube_dl=False))["video"])
print(next(get_posts(post_urls=[4329977513753335], youtube_dl=True))["video"])

I get these two links: https://video.fakl8-1.fna.fbcdn.net/v/t42.1790-2/220865328_829133994658684_6023360137011821048_n.mp4?_nc_cat=106&ccb=1-3&_nc_sid=985c63&efg=eyJybHIiOjMzNCwicmxhIjo1MTIsInZlbmNvZGVfdGFnIjoic3ZlX3NkIn0%3D&_nc_ohc=z8bGVveLdfsAX8yGbco&_nc_rml=0&_nc_ht=video.fakl8-1.fna&oh=8a0ed270e84920959c98f6fa793a3e24&oe=61024B40

https://scontent.fakl8-1.fna.fbcdn.net/v/t66.36240-6/119936451_336328168028891_2849851319156943934_n.mp4?_nc_cat=102&ccb=1-3&_nc_sid=985c63&efg=eyJ2ZW5jb2RlX3RhZyI6Im9lcF9oZCJ9&_nc_ohc=OwHKLG6xaOMAX9tFkCq&_nc_ht=scontent.fakl8-1.fna&oh=c1b22794eb0bf88821b8fb58b0aa76d5&oe=6107DCBB

The version extracted with youtube_dl is higher resolution.

Python 3.8.5 master youtube-dl==2021.6.6 Ubuntu 20.04.2 LTS

If you add options={"youtube_dl_verbose": True} do you get any debug output from youtube-dl? I get [facebook] 4329977513753335: Downloading webpage. Does the command youtube-dl -f best -g 'https://facebook.com/story.php?story_fbid=4329977513753335&id=119240841493711' give you the above HQ video link?

hmijail commented 3 years ago

If I follow your example, I get your same results. I do get the debug output, and the youtube-dl command gives me the HQ video link as you said.

But if I change the post ID to the (private) post I want to download, the case youtube_dl=True only prints None

Also, to double check: youtube_dl only is supposed to change the resulting URL, and then I'm supposed to download that URL myself, right?

hmijail commented 3 years ago

I just tried running youtube-dl -f best -g ... by itself (feeding it its own copy of the cookies, etc). It spits out an URL, but is the same low quality one that facebook_scraper gets with youtube_dl=False.

FWIW, I can confirm that the video in FB is available in higher quality.

Anyway, at this point to me this is rather a curiosity - facebook_scraper is great already. Thank you lots!

hmijail commented 3 years ago

(Sorry, I guess I should be more precise: the URL that is generated by each is not exactly the same: some chars change in the URL params. But the result is the same low video resolution in both cases)

neon-ninja commented 3 years ago

I note video_data_element (selector [data-sigil="inlineVideo"]) includes a dashManifest key, which contains an XML document describing all of the available quality formats. Does your private post have this XML document too? In the example I gave, it's

'<?xml version="1.0"?>\n'
 '<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" minBufferTime="PT1.500S" '
 'type="static" mediaPresentationDuration="PT0H1M0.736S" '
 'maxSegmentDuration="PT0H0M5.005S" '
 'profiles="urn:mpeg:dash:profile:isoff-on-demand:2011,http://dashif.org/guidelines/dash264" '
 'FBTagsetUsed="r2_avc_gen1avc"><Period duration="PT0H1M0.736S"><AdaptationSet '
 'segmentAlignment="true" maxWidth="1920" maxHeight="1080" '
 'maxFrameRate="11988/400" par="16:9" lang="eng" subsegmentAlignment="true" '
 'subsegmentStartsWithSAP="1"><Representation id="219891836558209v" '
 'mimeType="video/mp4" codecs="avc1.4D401E" width="640" height="360" '
 'frameRate="11988/400" sar="1:1" startWithSAP="1" bandwidth="120258" '
 'FBEncodingTag="dash_r2_avc_gen1avc_q50_frag_2_video" FBDefaultQuality="1" '
 'FBQualityClass="sd" '
 'FBQualityLabel="270p"><BaseURL>https://video.fhlz2-1.fna.fbcdn.net/v/t39.25447-2/222628662_325434325934023_1877805574929705920_n.mp4?_nc_cat=109&ccb=1-3&_nc_sid=5aebc0&efg=eyJ2ZW5jb2RlX3RhZyI6ImRhc2hfcjJfYXZjX2dlbjFhdmNfcTUwX2ZyYWdfMl92aWRlbyJ9&_nc_ohc=CTrb9wINNlAAX-59mHZ&_nc_rml=0&_nc_ht=video.fhlz2-1.fna&oh=76fe26dd0460f77e3ad9ed8841adaf59&oe=61079963</BaseURL><SegmentBase '
 'indexRangeExact="true" indexRange="969-1156" '
 'FBFirstSegmentRange="1157-105199"><Initialization '
 'range="0-968"/></SegmentBase></Representation><Representation '
 'id="842183516734859v" mimeType="video/mp4" codecs="avc1.4D401E" width="512" '
 'height="288" frameRate="11988/400" sar="1:1" startWithSAP="1" '
 'bandwidth="61584" FBEncodingTag="dash_r2_avc_gen1avc_q30_frag_2_video" '
 'FBQualityClass="sd" '
 'FBQualityLabel="180p"><BaseURL>https://video.fhlz2-1.fna.fbcdn.net/v/t39.25447-2/221857095_2954702901465040_4923298767754158415_n.mp4?_nc_cat=106&ccb=1-3&_nc_sid=5aebc0&efg=eyJ2ZW5jb2RlX3RhZyI6ImRhc2hfcjJfYXZjX2dlbjFhdmNfcTMwX2ZyYWdfMl92aWRlbyJ9&_nc_ohc=X2ueErKawNkAX8hqRLF&tn=M_pw8qegwC4qCLuh&_nc_rml=0&_nc_ht=video.fhlz2-1.fna&oh=ca1efb946ef64ed858d7fa961de6b21a&oe=61087F24</BaseURL><SegmentBase '
 'indexRangeExact="true" indexRange="968-1155" '
 'FBFirstSegmentRange="1156-55040"><Initialization '
 'range="0-967"/></SegmentBase></Representation><Representation '
 'id="4406189916110818v" mimeType="video/mp4" codecs="avc1.4D401E" width="512" '
 'height="288" frameRate="11988/400" sar="1:1" startWithSAP="1" '
 'bandwidth="89106" FBEncodingTag="dash_r2_avc_gen1avc_q40_frag_2_video" '
 'FBQualityClass="sd" '
 'FBQualityLabel="240p"><BaseURL>https://video.fhlz2-1.fna.fbcdn.net/v/t39.25447-2/221552561_561004131586292_6242449440960302869_n.mp4?_nc_cat=108&ccb=1-3&_nc_sid=5aebc0&efg=eyJ2ZW5jb2RlX3RhZyI6ImRhc2hfcjJfYXZjX2dlbjFhdmNfcTQwX2ZyYWdfMl92aWRlbyJ9&_nc_ohc=jlcLBXVIvRAAX97b14n&_nc_rml=0&_nc_ht=video.fhlz2-1.fna&oh=05417ea531c17aa5257f39b56a734d1a&oe=6108BCAB</BaseURL><SegmentBase '
 'indexRangeExact="true" indexRange="968-1155" '
 'FBFirstSegmentRange="1156-80379"><Initialization '
 'range="0-967"/></SegmentBase></Representation><Representation '
 'id="1981178932045350v" mimeType="video/mp4" codecs="avc1.4D401E" width="640" '
 'height="360" frameRate="11988/400" sar="1:1" startWithSAP="1" '
 'bandwidth="195860" FBEncodingTag="dash_r2_avc_gen1avc_q60_frag_2_video" '
 'FBQualityClass="sd" '
 'FBQualityLabel="360p"><BaseURL>https://video.fhlz2-1.fna.fbcdn.net/v/t39.25447-2/222526024_4141815189268569_4837538362494135501_n.mp4?_nc_cat=111&ccb=1-3&_nc_sid=5aebc0&efg=eyJ2ZW5jb2RlX3RhZyI6ImRhc2hfcjJfYXZjX2dlbjFhdmNfcTYwX2ZyYWdfMl92aWRlbyJ9&_nc_ohc=L6zToFE1bbUAX88kwDw&_nc_rml=0&_nc_ht=video.fhlz2-1.fna&oh=a0783309ab6d0928e4e03257c5056174&oe=6108CFED</BaseURL><SegmentBase '
 'indexRangeExact="true" indexRange="968-1155" '
 'FBFirstSegmentRange="1156-170157"><Initialization '
 'range="0-967"/></SegmentBase></Representation><Representation '
 'id="341551637462543v" mimeType="video/mp4" codecs="avc1.4D401F" width="960" '
 'height="540" frameRate="11988/400" sar="1:1" startWithSAP="1" '
 'bandwidth="306370" FBEncodingTag="dash_r2_avc_gen1avc_q70_frag_2_video" '
 'FBQualityClass="sd" '
 'FBQualityLabel="540p"><BaseURL>https://video.fhlz2-1.fna.fbcdn.net/v/t39.25447-2/224777129_3830178177088046_7625893569852366994_n.mp4?_nc_cat=101&ccb=1-3&_nc_sid=5aebc0&efg=eyJ2ZW5jb2RlX3RhZyI6ImRhc2hfcjJfYXZjX2dlbjFhdmNfcTcwX2ZyYWdfMl92aWRlbyJ9&_nc_ohc=LugsDjJ6lhgAX_8-9Bg&tn=M_pw8qegwC4qCLuh&_nc_rml=0&_nc_ht=video.fhlz2-1.fna&oh=5f7383f6df7be3f541195284df7c64fd&oe=6108D238</BaseURL><SegmentBase '
 'indexRangeExact="true" indexRange="968-1155" '
 'FBFirstSegmentRange="1156-247335"><Initialization '
 'range="0-967"/></SegmentBase></Representation><Representation '
 'id="614284586201286v" mimeType="video/mp4" codecs="avc1.640028" width="1920" '
 'height="1080" frameRate="11988/400" sar="1:1" startWithSAP="1" '
 'bandwidth="391561" FBEncodingTag="dash_r2_avc_gen1avc_q90_frag_2_video" '
 'FBQualityClass="hd" '
 'FBQualityLabel="960p"><BaseURL>https://video.fhlz2-1.fna.fbcdn.net/v/t39.25447-2/222628263_113233264268194_1764941791678126007_n.mp4?_nc_cat=105&ccb=1-3&_nc_sid=5aebc0&efg=eyJ2ZW5jb2RlX3RhZyI6ImRhc2hfcjJfYXZjX2dlbjFhdmNfcTkwX2ZyYWdfMl92aWRlbyJ9&_nc_ohc=T_yDutjG1doAX-NtGA2&_nc_rml=0&_nc_ht=video.fhlz2-1.fna&oh=b304599757ca409b687fe9d903452f97&oe=610886E5</BaseURL><SegmentBase '
 'indexRangeExact="true" indexRange="975-1162" '
 'FBFirstSegmentRange="1163-304216"><Initialization '
 'range="0-974"/></SegmentBase></Representation><Representation '
 'id="5810164892357929v" mimeType="video/mp4" codecs="avc1.4D401F" '
 'width="1280" height="720" frameRate="11988/400" sar="1:1" startWithSAP="1" '
 'bandwidth="503796" FBEncodingTag="dash_r2_avc_gen1avc_q80_frag_2_video" '
 'FBQualityClass="hd" '
 'FBQualityLabel="1080p"><BaseURL>https://video.fhlz2-1.fna.fbcdn.net/v/t39.25447-2/222795885_412605693390767_4703796632599816540_n.mp4?_nc_cat=109&ccb=1-3&_nc_sid=5aebc0&efg=eyJ2ZW5jb2RlX3RhZyI6ImRhc2hfcjJfYXZjX2dlbjFhdmNfcTgwX2ZyYWdfMl92aWRlbyJ9&_nc_ohc=13--N1IHtyMAX_pDZ1H&_nc_rml=0&_nc_ht=video.fhlz2-1.fna&oh=b083b3966003ccc2be7f231ec551ef88&oe=6108A822</BaseURL><SegmentBase '
 'indexRangeExact="true" indexRange="967-1154" '
 'FBFirstSegmentRange="1155-381945"><Initialization '
 'range="0-966"/></SegmentBase></Representation></AdaptationSet><AdaptationSet '
 'segmentAlignment="true" lang="eng" subsegmentAlignment="true" '
 'subsegmentStartsWithSAP="1"><Representation id="890518144878768a" '
 'mimeType="audio/mp4" codecs="mp4a.40.5" audioSamplingRate="48000" '
 'startWithSAP="1" bandwidth="65485" '
 'FBEncodingTag="dash_audio_aacp_64_frag_2_audio" '
 'FBDefaultQuality="1"><AudioChannelConfiguration '
 'schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" '
 'value="2"/><BaseURL>https://video.fhlz2-1.fna.fbcdn.net/v/t42.1790-2/221823085_890518151545434_1383734250360024819_n.mp4?_nc_cat=106&ccb=1-3&_nc_sid=5aebc0&efg=eyJ2ZW5jb2RlX3RhZyI6ImRhc2hfYXVkaW9fYWFjcF82NF9mcmFnXzJfYXVkaW8ifQ%3D%3D&_nc_ohc=nQiOCWA5s20AX-7T-8F&tn=M_pw8qegwC4qCLuh&_nc_rml=0&_nc_ht=video.fhlz2-1.fna&oh=c1908f6331a2c1e288c61da0202ed910&oe=6103A396</BaseURL><SegmentBase '
 'indexRangeExact="true" indexRange="931-1334" '
 'FBFirstSegmentRange="1335-17879"><Initialization '
 'range="0-930"/></SegmentBase></Representation></AdaptationSet></Period></MPD>\n'

FBQualityLabel="1080p"><BaseURL>https://video.fhlz2-1.fna.fbcdn.net/v/t39.25447-2/222795885_412605693390767_4703796632599816540_n.mp4?_nc_cat=109&ccb=1-3&_nc_sid=5aebc0&efg=eyJ2ZW5jb2RlX3RhZyI6ImRhc2hfcjJfYXZjX2dlbjFhdmNfcTgwX2ZyYWdfMl92aWRlbyJ9&_nc_ohc=13--N1IHtyMAX_pDZ1H&_nc_rml=0&_nc_ht=video.fhlz2-1.fna&oh=b083b3966003ccc2be7f231ec551ef88&oe=6108A822</BaseURL><SegmentBase ' seems of interest

hmijail commented 3 years ago

Sorry, somehow I missed this last message. I know very little about Javascript in the browser, so I don't really understand your question. In any case I tried searching for video_data_element, inlineVideo, dashManifest, data-sigil in the HTML using the Web dev tools in Firefox. I didn't find any of them. Also, the Network tab in those tools doesn't show any XML file being downloaded.

However, I did find the string FBQualityLabel a number of times with different qualities, followed by a BaseURL that yielded a video of that quality.

hmijail commented 3 years ago

Interestingly the FBQualityLabel max is 720p. There is yet another URL a bit further into the code, but it has no FBQualityLabel. FWIW, this is a live video post.

neon-ninja commented 3 years ago

Realised I forgot to reply to this:

Also, to double check: youtube_dl only is supposed to change the resulting URL, and then I'm supposed to download that URL myself, right?

Yes, the scraper only handles link extraction, downloading is up to you. You could use the requests library or wget for this task.