yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
89.66k stars 6.95k forks source link

Instagram reels fail to download #11151

Closed marc-weber1 closed 1 month ago

marc-weber1 commented 1 month ago

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Region

Canada

Provide a description that is worded well enough to be understood

Thought it was a rate limit, but I tried from multiple different IPs and all failed, while they all worked from firefox browser

example reel (loud): https://www.instagram.com/reel/DAgxVRCsDgA

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

[debug] Command-line config: ['-vU', 'https://www.instagram.com/reel/DAgxVRCsDgA']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version nightly@2024.10.01.232843 from yt-dlp/yt-dlp-nightly-builds [e59c82a74] (zip)
[debug] Python 3.12.3 (CPython x86_64 64bit) - Linux-6.8.0-41-generic-x86_64-with-glibc2.39 (OpenSSL 3.0.13 30 Jan 2024, glibc 2.39)
[debug] exe versions: none
[debug] Optional libraries: certifi-2023.11.17, requests-2.31.0, sqlite3-3.45.1, urllib3-2.0.7
[debug] Proxy map: {}
[debug] Request Handlers: urllib
[debug] Loaded 1838 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-nightly-builds/releases/latest
Latest version: nightly@2024.10.01.232843 from yt-dlp/yt-dlp-nightly-builds
yt-dlp is up to date (nightly@2024.10.01.232843 from yt-dlp/yt-dlp-nightly-builds)
[Instagram] Extracting URL: https://www.instagram.com/reel/DAgxVRCsDgA
[Instagram] DAgxVRCsDgA: Setting up session
[Instagram] DAgxVRCsDgA: Downloading JSON metadata
WARNING: [Instagram] DAgxVRCsDgA: General metadata extraction failed (some metadata might be missing).
[Instagram] DAgxVRCsDgA: Downloading webpage
WARNING: [Instagram] unable to extract shared data; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
WARNING: [Instagram] Main webpage is locked behind the login page. Retrying with embed webpage (some metadata might be missing).
[Instagram] DAgxVRCsDgA: Downloading embed webpage
WARNING: [Instagram] unable to extract additional data; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
ERROR: [Instagram] DAgxVRCsDgA: Requested content is not available, rate-limit reached or login required. Use --cookies, --cookies-from-browser, --username and --password, --netrc-cmd, or --netrc (instagram) to provide account credentials
  File "/home/facade/AUTO-YTDLP-BOT/./yt-dlp/yt_dlp/extractor/common.py", line 741, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/facade/AUTO-YTDLP-BOT/./yt-dlp/yt_dlp/extractor/instagram.py", line 460, in _real_extract
    self.raise_login_required('Requested content is not available, rate-limit reached or login required')
  File "/home/facade/AUTO-YTDLP-BOT/./yt-dlp/yt_dlp/extractor/common.py", line 1257, in raise_login_required
    raise ExtractorError(msg, expected=True)
bashonly commented 1 month ago

I can repro. Seems like the decrepit extractor has finally been fully broken

adanvdo commented 1 month ago

Started running into this issue today as well. using version 2024.9.27.0

adanvdo commented 1 month ago

In an effort to download a reel manually (https://www.instagram.com/foocey/reel/DAaER-1Oriq/) I dug into the http requests on the page and found this:

A get request to https://www.instagram.com/api/v1/media/3466101691097200810/info/ returned a json response (fields omitted for relevance) :

{
  items: [
    code: "DAaER-1Oriq",
    pk: "3466101691097200810",
    video_dash_manifest: "<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 DASH-MPD.xsd" profiles="urn:mpeg:dash:profile:isoff-on-demand:2011" minBufferTime="PT2S" type="static" mediaPresentationDuration="PT9.473742S" FBManifestIdentifier="FgAYEGlnX2Rhc2hfYmFzZWxpbmUZNs64peCcmqQDxviG3se/pQT09Oy6m9KvBCIYGGRhc2hfbG5faGVhYWNfdmJyM19hdWRpbwA="><Period id="0" duration="PT9.473742S"><AdaptationSet id="0" contentType="video" frameRate="15360/512" subsegmentAlignment="true" par="9:16" FBUnifiedUploadResolutionMos="360:75.5"><SupplementalProperty schemeIdUri="urn:mpeg:mpegB:cicp:TransferCharacteristics" value="6"/><Representation id="924040302997031vd" bandwidth="578362" codecs="avc1.64001f" mimeType="video/mp4" sar="1:1" FBEncodingTag="dash_baseline_1_v1" FBContentLength="684396" FBPlaybackResolutionMos="0:100,360:94.7,480:91.2,720:86.6,1080:81.9" FBPlaybackResolutionMosConfidenceLevel="high" FBPlaybackResolutionCsvqm="0:100,360:98.19,480:96.8,720:95.4,1080:93.9" FBAbrPolicyTags="" width="720" height="1280" FBDefaultQuality="1" FBQualityClass="hd" FBQualityLabel="720p"><BaseURL>https://scontent-atl3-2.cdninstagram.com/o1/v/t16/f1/m86/83498DA848AB9E46281A9A432E450DA8_video_dashinit.mp4?efg=eyJ2aWRlb19pZCI6bnVsbCwidmVuY29kZV90YWciOiJpZy14cHZkcy5jbGlwcy5jMi1DMy5kYXNoX2Jhc2VsaW5lXzFfdjEifQ&amp;_nc_ht=scontent-atl3-2.cdninstagram.com&amp;_nc_cat=105&amp;ccb=9-4&amp;oh=00_AYB3GPMDTmF7vd_5mr5lbWMxlULgw_hfc1kCVGW7hpTjLw&amp;oe=6700A82D&amp;_nc_sid=f1f4f2</BaseURL><SegmentBase indexRange="892-947" timescale="15360" FBMinimumPrefetchRange="948-32377" FBFirstSegmentRange="948-431024" FBFirstSegmentDuration="5000" FBSecondSegmentRange="431025-684395" FBPrefetchSegmentRange="948-431024" FBPrefetchSegmentDuration="5000"><Initialization range="0-891"/></SegmentBase></Representation><Representation id="1230666434714938v" bandwidth="170919" codecs="avc1.4d001e" mimeType="video/mp4" sar="1:1" FBEncodingTag="dash_baseline_3_v1" FBContentLength="202255" FBPlaybackResolutionMos="0:100,360:71.4,480:64.7,720:57.8,1080:54.2" FBPlaybackResolutionMosConfidenceLevel="high" FBPlaybackResolutionCsvqm="0:100,360:86.1,480:79.8,720:73.2,1080:69.1" FBAbrPolicyTags="" width="360" height="640" FBQualityClass="sd" FBQualityLabel="360p"><BaseURL>https://scontent-atl3-2.cdninstagram.com/o1/v/t16/f1/m86/38454FFAF1B63022840A87BDC1DD5681_video_dashinit.mp4?efg=eyJ2aWRlb19pZCI6bnVsbCwidmVuY29kZV90YWciOiJpZy14cHZkcy5jbGlwcy5jMi1DMy5kYXNoX2Jhc2VsaW5lXzNfdjEifQ&amp;_nc_ht=scontent-atl3-2.cdninstagram.com&amp;_nc_cat=103&amp;ccb=9-4&amp;oh=00_AYAtWSQKMsJnx5RfpaD4UBwHQVp32cyvCnpxSL4OaWUVZA&amp;oe=6700A53B&amp;_nc_sid=f1f4f2</BaseURL><SegmentBase indexRange="887-942" timescale="15360" FBMinimumPrefetchRange="943-14044" FBFirstSegmentRange="943-126717" FBFirstSegmentDuration="5000" FBSecondSegmentRange="126718-202254" FBPrefetchSegmentRange="943-126717" FBPrefetchSegmentDuration="5000"><Initialization range="0-886"/></SegmentBase></Representation></AdaptationSet><AdaptationSet id="1" contentType="audio" subsegmentStartsWithSAP="1" subsegmentAlignment="true"><Representation id="1208355727138339ad" bandwidth="76469" codecs="mp4a.40.5" mimeType="audio/mp4" FBAvgBitrate="76469" audioSamplingRate="44100" FBEncodingTag="dash_ln_heaac_vbr3_audio" FBContentLength="91471" FBPaqMos="83.80" FBAbrPolicyTags="" FBDefaultQuality="1"><AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/><BaseURL>https://scontent-atl3-1.cdninstagram.com/v/t50.33967-16/461311387_502055819321624_5847132663744181196_n.mp4?_nc_cat=109&amp;ccb=1-7&amp;_nc_sid=9a5d50&amp;efg=eyJ2ZW5jb2RlX3RhZyI6ImlnLXhwdmRzLmNsaXBzLmMyLUMzLmRhc2hfbG5faGVhYWNfdmJyM19hdWRpbyIsInZpZGVvX2lkIjpudWxsfQ%3D%3D&amp;_nc_ohc=yR_MuIZaayoQ7kNvgGD-DTc&amp;_nc_ht=scontent-atl3-1.cdninstagram.com&amp;_nc_gid=AL6eZp4GBz9V1ZoIf08mSWd&amp;oh=00_AYA75_hrxi1U_XeE4DoLsSLXsSNnN7i2eNFcYb9FnlUAbA&amp;oe=67049FE6</BaseURL><SegmentBase indexRange="824-915" timescale="44100" FBMinimumPrefetchRange="916-1259" FBFirstSegmentRange="916-21518" FBFirstSegmentDuration="2021" FBSecondSegmentRange="21519-40307" FBPrefetchSegmentRange="916-40307" FBPrefetchSegmentDuration="4017"><Initialization range="0-823"/></SegmentBase></Representation></AdaptationSet></Period></MPD>"
  ]
}

The format URLs can be extracted from this and html decoded. I was able to download video and audio from those.

I don't know if this helps, but I hope so!

bashonly commented 1 month ago

@adanvdo That is how the extractor currently works when you pass logged-in cookies (so at least the extractor is not broken when logged-in?). Does it work w/o cookies? IIRC it used to work w/o cookies a few times and then you'd be blocked for 24+ hours

(Perennial warning that passing logged-in cookies to yt-dlp for this site can get your account permanently banned)

adanvdo commented 1 month ago

@bashonly when I am logged out, there is no request to api/v1/media/3466101691097200810/info/ instead, there is an xhr query post request.

image

that returns json in this format:

{
  data: {
    xdt_shortcode_media: {
      id: "3466101691097200810",
      shortcode: "DAaER-1Oriq",
      video_url: "https://scontent-atl3-2.cdninstagram.com/o1/v/t16/f1/m86/83498DA848AB9E46281A9A432E450DA8_video_dashinit.mp4?stp=dst-mp4&efg=eyJxZV9ncm91cHMiOiJbXCJpZ193ZWJfZGVsaXZlcnlfdnRzX290ZlwiXSIsInZlbmNvZGVfdGFnIjoidnRzX3ZvZF91cmxnZW4uY2xpcHMuYzIuNzIwLmJhc2VsaW5lIn0&_nc_cat=105&vs=924040302997031_1936272661&_nc_vs=HBksFQIYUmlnX3hwdl9yZWVsc19wZXJtYW5lbnRfc3JfcHJvZC84MzQ5OERBODQ4QUI5RTQ2MjgxQTlBNDMyRTQ1MERBOF92aWRlb19kYXNoaW5pdC5tcDQVAALIAQAVAhg6cGFzc3Rocm91Z2hfZXZlcnN0b3JlL0dKc05meHNZb2NUNm5jZ0JBTXd6RHFFdE1DVlJicV9FQUFBRhUCAsgBACgAGAAbABUAACbk7YSS3MWcQBUCKAJDMywXQCLul41P3zsYEmRhc2hfYmFzZWxpbmVfMV92MREAdf4HAA%3D%3D&_nc_rid=ff6ff092d5&ccb=9-4&oh=00_AYCW3JUQJZ6vXrCQzBl1nCEKLzGQCtd7eMaBu2-GcC1nYQ&oe=6700A82D&_nc_sid=d885a2",
      dash_info: {
        video_dash_manifest: "<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 DASH-MPD.xsd" profiles="urn:mpeg:dash:profile:isoff-on-demand:2011" minBufferTime="PT2S" type="static" mediaPresentationDuration="PT9.473742S" FBManifestIdentifier="FgAYEGlnX2Rhc2hfYmFzZWxpbmUZNs64peCcmqQDxviG3se/pQT09Oy6m9KvBCIYGGRhc2hfbG5faGVhYWNfdmJyM19hdWRpbwA="><Period id="0" duration="PT9.473742S"><AdaptationSet id="0" contentType="video" frameRate="15360/512" subsegmentAlignment="true" par="9:16" FBUnifiedUploadResolutionMos="360:75.5"><SupplementalProperty schemeIdUri="urn:mpeg:mpegB:cicp:TransferCharacteristics" value="6"/><Representation id="924040302997031vd" bandwidth="578362" codecs="avc1.64001f" mimeType="video/mp4" sar="1:1" FBEncodingTag="dash_baseline_1_v1" FBContentLength="684396" FBPlaybackResolutionMos="0:100,360:94.7,480:91.2,720:86.6,1080:81.9" FBPlaybackResolutionMosConfidenceLevel="high" FBPlaybackResolutionCsvqm="0:100,360:98.19,480:96.8,720:95.4,1080:93.9" FBAbrPolicyTags="" width="720" height="1280" FBDefaultQuality="1" FBQualityClass="hd" FBQualityLabel="720p"><BaseURL>https://scontent-atl3-2.cdninstagram.com/o1/v/t16/f1/m86/83498DA848AB9E46281A9A432E450DA8_video_dashinit.mp4?efg=eyJ2aWRlb19pZCI6bnVsbCwidmVuY29kZV90YWciOiJpZy14cHZkcy5jbGlwcy5jMi1DMy5kYXNoX2Jhc2VsaW5lXzFfdjEifQ&amp;_nc_ht=scontent-atl3-2.cdninstagram.com&amp;_nc_cat=105&amp;ccb=9-4&amp;oh=00_AYB3GPMDTmF7vd_5mr5lbWMxlULgw_hfc1kCVGW7hpTjLw&amp;oe=6700A82D&amp;_nc_sid=f1f4f2</BaseURL><SegmentBase indexRange="892-947" timescale="15360" FBMinimumPrefetchRange="948-32377" FBFirstSegmentRange="948-431024" FBFirstSegmentDuration="5000" FBSecondSegmentRange="431025-684395" FBPrefetchSegmentRange="948-431024" FBPrefetchSegmentDuration="5000"><Initialization range="0-891"/></SegmentBase></Representation><Representation id="1230666434714938v" bandwidth="170919" codecs="avc1.4d001e" mimeType="video/mp4" sar="1:1" FBEncodingTag="dash_baseline_3_v1" FBContentLength="202255" FBPlaybackResolutionMos="0:100,360:71.4,480:64.7,720:57.8,1080:54.2" FBPlaybackResolutionMosConfidenceLevel="high" FBPlaybackResolutionCsvqm="0:100,360:86.1,480:79.8,720:73.2,1080:69.1" FBAbrPolicyTags="" width="360" height="640" FBQualityClass="sd" FBQualityLabel="360p"><BaseURL>https://scontent-atl3-2.cdninstagram.com/o1/v/t16/f1/m86/38454FFAF1B63022840A87BDC1DD5681_video_dashinit.mp4?efg=eyJ2aWRlb19pZCI6bnVsbCwidmVuY29kZV90YWciOiJpZy14cHZkcy5jbGlwcy5jMi1DMy5kYXNoX2Jhc2VsaW5lXzNfdjEifQ&amp;_nc_ht=scontent-atl3-2.cdninstagram.com&amp;_nc_cat=103&amp;ccb=9-4&amp;oh=00_AYAtWSQKMsJnx5RfpaD4UBwHQVp32cyvCnpxSL4OaWUVZA&amp;oe=6700A53B&amp;_nc_sid=f1f4f2</BaseURL><SegmentBase indexRange="887-942" timescale="15360" FBMinimumPrefetchRange="943-14044" FBFirstSegmentRange="943-126717" FBFirstSegmentDuration="5000" FBSecondSegmentRange="126718-202254" FBPrefetchSegmentRange="943-126717" FBPrefetchSegmentDuration="5000"><Initialization range="0-886"/></SegmentBase></Representation></AdaptationSet><AdaptationSet id="1" contentType="audio" subsegmentStartsWithSAP="1" subsegmentAlignment="true"><Representation id="1208355727138339ad" bandwidth="76469" codecs="mp4a.40.5" mimeType="audio/mp4" FBAvgBitrate="76469" audioSamplingRate="44100" FBEncodingTag="dash_ln_heaac_vbr3_audio" FBContentLength="91471" FBPaqMos="83.80" FBAbrPolicyTags="" FBDefaultQuality="1"><AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/><BaseURL>https://scontent-atl3-1.cdninstagram.com/v/t50.33967-16/461311387_502055819321624_5847132663744181196_n.mp4?_nc_cat=109&amp;ccb=1-7&amp;_nc_sid=9a5d50&amp;efg=eyJ2ZW5jb2RlX3RhZyI6ImlnLXhwdmRzLmNsaXBzLmMyLUMzLmRhc2hfbG5faGVhYWNfdmJyM19hdWRpbyIsInZpZGVvX2lkIjpudWxsfQ%3D%3D&amp;_nc_ohc=yR_MuIZaayoQ7kNvgGD-DTc&amp;_nc_ht=scontent-atl3-1.cdninstagram.com&amp;_nc_gid=A9OFFypqKwVyNI_wmS055Nw&amp;oh=00_AYAqsOfU7JlCi04cMgqMHK_6Jj7JNhFvapGlnF5TpShqIg&amp;oe=67049FE6</BaseURL><SegmentBase indexRange="824-915" timescale="44100" FBMinimumPrefetchRange="916-1259" FBFirstSegmentRange="916-21518" FBFirstSegmentDuration="2021" FBSecondSegmentRange="21519-40307" FBPrefetchSegmentRange="916-40307" FBPrefetchSegmentDuration="4017"><Initialization range="0-823"/></SegmentBase></Representation></AdaptationSet></Period></MPD>
"
      }
    }
  }
}

I can use the video_url value with yt-dlp fine

marc-weber1 commented 1 month ago

Looks like a fetch command like this works for getting the url (try it yourself):

fetch("https://www.instagram.com/graphql/query", {
    "credentials": "include",
    "headers": {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130.0",
        "Accept": "*/*",
        "Accept-Language": "en-CA,en-US;q=0.7,en;q=0.3",
        "Content-Type": "application/x-www-form-urlencoded",
        "X-FB-Friendly-Name": "PolarisPostActionLoadPostQueryQuery",
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-origin"
    },
    "referrer": "https://www.instagram.com/reel/DAO49r3SsBR/",
    "body": "variables=%7B%22shortcode%22%3A%22DAO49r3SsBR%22%2C%22fetch_tagged_user_count%22%3Anull%2C%22hoisted_comment_id%22%3Anull%2C%22hoisted_reply_id%22%3Anull%7D&server_timestamps=true&doc_id=8845758582119845",
    "method": "POST",
    "mode": "cors"
}).then(resp => resp.json())
.then(resp => console.log(resp.data.xdt_shortcode_media.video_url));

the variables param in the body is just a url-encoded version of {"shortcode":"DAO49r3SsBR","fetch_tagged_user_count":null,"hoisted_comment_id":null,"hoisted_reply_id":null} so this returns a URL directly from shortcode which is cool

The only question now is how to get the document ID 8845758582119845 - but it seems to be the same for me and a friend? and does not seem to change - not sure if it's a good idea to hardcode 🥴

tetra-fox commented 1 month ago

I think it may be okay to hardcode the doc_id. It seems that is the same approach currently used by the extractor: https://github.com/yt-dlp/yt-dlp/blob/e59c82a74cda5139eb3928c75b0bd45484dbe7f0/yt_dlp/extractor/instagram.py#L438 Meta seems to have moved Instagram over to their Relay client for making GraphQL queries. There is a set of doc_ids which correspond to what are essentially GQL query 'presets', saved on the server side, to reduce the amount of data the client needs to send. This also prevents arbitrary queries from working. 8845758582119845 seems to be the doc_id of interest, as it provides the video URL at data.xdt_shortcode_media.video_url in the JSON response.

The bare minimum for making a successful request is as follows, updating the shortcode as needed

curl --request POST \
  --url https://www.instagram.com/graphql/query \
  --data 'variables={"shortcode":"DAJYHpwCjMP"}' \
  --data doc_id=8845758582119845

See Persisted Queries in the Relay documentation.

nikalasmd commented 1 month ago

Any update on this case?

nikalasmd commented 1 month ago

@bashonly When will it be fixed?

adanvdo commented 1 month ago

@bashonly When will it be fixed?

this is a community maintained project. All the devs that work on this have lives and don't owe us anything. Just be patient.

In the mean time, use your browser web tools to get the json packages that contain the video_url and use that url with yt-dlp

soulfulkhani commented 1 month ago

@bashonly When will it be fixed?

this is a community maintained project. All the devs that work on this have lives and don't owe us anything. Just be patient.

In the mean time, use your browser web tools to get the json packages that contain the video_url and use that url with yt-dlp

hey, can you please tell how to use this thing, i have no idea. we open the ingtagram video then open dev tool after that we go to network tabs ??

bigleanator commented 1 month ago

quick one liner to find the urls based on @tetra-fox 's comment

curl --request POST \
   --url https://www.instagram.com/graphql/query \
   --data 'variables={"shortcode":"CHANGE THIS TO THE VIDEO ID"}' \
   --data doc_id=8845758582119845 | jq | grep video_url
adanvdo commented 1 month ago

@bashonly When will it be fixed?

this is a community maintained project. All the devs that work on this have lives and don't owe us anything. Just be patient. In the mean time, use your browser web tools to get the json packages that contain the video_url and use that url with yt-dlp

hey, can you please tell how to use this thing, i have no idea. we open the ingtagram video then open dev tool after that we go to network tabs ??

open the dev tools and network tab and then refresh the reel page. switch back to Dev tools and view the network tab results. look for the entry with the name "query". click the response tab to view the json response. the video_url is in that

Screenshot_20241006-161537_Kiwi Browser

soulfulkhani commented 1 month ago

@bashonly When will it be fixed?

this is a community maintained project. All the devs that work on this have lives and don't owe us anything. Just be patient. In the mean time, use your browser web tools to get the json packages that contain the video_url and use that url with yt-dlp

hey, can you please tell how to use this thing, i have no idea. we open the ingtagram video then open dev tool after that we go to network tabs ??

open the dev tools and network tab and then refresh the reel page. switch back to Dev tools and view the network tab results. look for the entry with the name "query". click the response tab to view the json response. the video_url is in that

Screenshot_20241006-161537_Kiwi Browser

thank you so much