oorestisime / gatsby-source-instagram

Create nodes from instagram posts hashtags and profiles
https://gatsby-src-instagram.netlify.com/
MIT License
149 stars 53 forks source link

Public scraping methods fail due to login screen on Instagram on production builds #24

Closed wjx0820 closed 1 year ago

wjx0820 commented 5 years ago

I use this plugin in my demo and could not work, it says could not fetch instagram posts, no Gatsby nodes generated(I did't use any token and just want to Public scraping for posts). So I cloned your repo, cd into /example, yarn install and run 'npm run develop'. And then it seems like the same problem happened. Wonder to know if i am missing something? Thanks!

oorestisime commented 4 years ago

@sjelfull Please let me know anything confusing or inconsistent i ll try to help you generate the token and update the docs accordingly :) Same for anybody else!

sjelfull commented 4 years ago

Let me see if I can boil it down - right now i'm so far down the hole that i'm not sure where I went wrong.

I'll add notes inline in the steps:

  1. You need to have a Facebook page (I know... :/)
  2. Go to your site settings -> Instagram -> Login into your Instagram account
  3. Create a app As far as I can tell, after this step, we need to add Instagram below the heading Add a Product on the FB apps dashboard.
  4. Go to the Graph API Explorer
    1. Select your App from the top right dropdown menu Its unclear what you mean by the top right dropdown menu here. If you select the app under My apps, which is the only relevant link I can see in the top right corner, you get brough out of the Graph explorer and to the dashboard
    2. Select "Get User Access Token" from dropdown (right of access token field) and select needed permissions (manage_pages, pages_show_list, instagram_basic) Sounds like the UI has changed a bit here. The previous step, where you select the app, is now below the Get Access Token button, in the field labeled Facebook App
    3. Click "Generate Access Token" Which permissions do we need to ask for for this to be able to access the right Instagram account? What fields/path do we use in the explorer?
    4. Copy user access token
  5. Access Token Debugger:
    1. Paste copied token and press "Debug"
    2. Press "Extend Access Token" and copy the generated long-lived user access token
  6. Graph API Explorer:
    1. Paste copied token into the "Access Token" field
    2. Make a GET request with "PAGE_ID?fields=access_token"
    3. Find the permanent page access token in the response (node "access_token")
  7. Access Token Debugger:
    1. Paste the permanent token and press "Debug"
    2. "Expires" should be "Never"
    3. Copy the access token
  8. Graph API Explorer:
    1. Make a GET request with "PAGE_ID?fields=instagram_business_account" to get your Business ID The business ID does not show up here. I only get the id of the page and nothing else.
oorestisime commented 4 years ago

For the permissions you will need to add the pages_manage_ads, pages_manage_metadata, pages_read_engagement, and pages_read_user_content,pages_show_list, instagram_basic (basically the ones listed in step 2 without manage_pages but with the ones it got replaced with). Hope this gets you unblocked.

I ll try tomorrow to go through all the steps and leave an update here. Thanks this was very helpful!

hvitis commented 4 years ago

I´m having the same issue on updated plugin, on local works fine.

Can I help in any way?

oorestisime commented 4 years ago

Unfortunately the public scraping method doesn't work on build in infrastructure. not sure when they decide to add the login screen either IP based or some kind of rate limit. In any case i think we can't bypass it (but open to suggestions).

You can test generating the access token in case you find more inconsistencies or you might want to open a PR to update them :)

LarsBehrenberg commented 4 years ago

I started using this approach for public scraping instead. I wonder wether the way images are being fetched with that approach could also be implemented in gatsby-source-instagram?

tinoguti commented 4 years ago

I started using this approach for public scraping instead. I wonder wether the way images are being fetched with that approach could also be implemented in gatsby-source-instagram?

This approach is working for me so far. Thanks. Although I wonder for how long it will work because it seems like it depends on a query id that might change in the future.

oorestisime commented 4 years ago

Yeah not sure i udnerstand what query id is. also, have you tried this request from a build such as netlify?

will-t-harris commented 4 years ago

I'm also not sure what that query id is, but I just built that solution with netlify and it works.

I also refactored it into a hook, which I'm considering throwing on npm. At least I was before I realized it was using that query_id. I'd be shocked if that didn't change in the future.

https://gist.github.com/will-t-harris/4b8671315fb77d541b6ad34276acfdb6

will-t-harris commented 4 years ago

It seems that query_id has been the same since 2017...perhaps this approach is more reliable than it looks at first glance.

https://stackoverflow.com/a/47243409/4552841

tinoguti commented 4 years ago

I was able to do a build on Amplify with that query id approach. It's working fine for now. I did try the whole Facebook develop access token process with no luck. It's quite tedious and requires time to set up.

oorestisime commented 4 years ago

OK great thanks folks, i ll add this in 0.7.3 to test. unless somebody wants to PR this :D

oorestisime commented 4 years ago

I am trying this now. i ll keep you posted

oorestisime commented 4 years ago

I just published 0.8.0-beta.0 with the change. My netlify build passed that stepped but failed because hashtag scraping still doesn't work. I would like some more feedback from you people though.

In that version public scraping of hashtags and profile information is not working. If any of you knows of a way to public scraping hashtags then i am all eyes to get it to work.

Public scraping for public posts is supposed to work though.

In order to use this you need to pass an instagram id instead of a username. I used https://codeofaninja.com/tools/find-instagram-user-id to get an id. (You pass it in the username config, i want to make sure this work before i jump on API changes).

Another change is on the instagram node the username is also going to be the instagram id you provided. Unfortunately instagram doesn't pass in the username when querying like that.

I managed to 50 posts instead of 12 so this is good news.

will-t-harris commented 4 years ago

Thanks for trying that out so quickly @oorestisime.

That version seems to work for me, locally and on my build in netlify.

dmcreis commented 4 years ago

thanks a lot for this @oorestisime . i will have a go at the new update. when i do npm outdated the only last version that appears available is 0.7.2. any idea why?

oorestisime commented 4 years ago

I think because i published a beta one. can you manually replace the version in package.json and then npm install?

abepuentes commented 4 years ago

Thanks @oorestisime, I've been testing 0.8.0-beta.0 and its works on develop en on netlify. One comment though, searching for this issue I found the following on instagram API docs. The ID you get from https://codeofaninja.com/tools/find-instagram-user-id es whats facebooks its calling "Legacy ID" and will be deprecated on September 30th, 2020. On the new graph.instagram.com API this field is call ig_id more infor here https://developers.facebook.com/docs/instagram-basic-display-api/. I try the new user "ID" field and doenst work with the "query_id" method.

oorestisime commented 4 years ago

Yes indeed its two different ids and i couldn't get it to work with the new id either. Probably that means that this method will work only until then :/ unless there's a way to specify ig_id in the query. i ll test this out tomorrow and get back here.

Still maybe a nice trade-off for a few months giving us time to find new solutions :)

oorestisime commented 4 years ago

I went through the steps again. here's what i found and let me know if this works for you and if it is clearer:

  1. You need to have a Facebook page (I know... :/)
  2. Go to your site settings -> Instagram -> Login into your Instagram account
  3. Create a app
  4. Go to the Graph API Explorer
    1. Make sure you are using v7 as api version
    2. Select your facebook app
    3. Click "Generate Access Token"
    4. Add the following permissions (pages_manage_ads, pages_manage_metadata, pages_read_engagement, pages_read_user_content, pages_show_list, instagram_basic)
    5. Make a GET request at me/accounts
    6. copy the access_token in the response (we call this temporary_token)
    7. click on the id to change the explorer url and append ?fields=instagram_business_account&access_token={access-token}
    8. save your instagram_business_account.id, this is your instagram_id
  5. Access Token Debugger:
    1. Paste your temporary_token and press "Debug"
    2. You should see this token now expires in 3 months
    3. Press "Extend Access Token" and press the new debug that appears next to the token
    4. You should see this token now never expires
    5. Copy this new token (we will call this access_token)

With these two information you can now use the plugin as:

{
  resolve: `gatsby-source-instagram`,
  options: {
    username: username,
    access_token: access_token,
    instagram_id: instagram_id,
  },
},
homearanya commented 4 years ago

HI @oorestisime,

Thanks for your efforts. I follow your steps above and it worked great for 2 of 3 Instagram accounts I manage. However, on the 3rd one I get the following error:

Could not download file, error is failed to process https://scontent.cdninstagram.com/v/t51.2885-15/20394100_1101681946599058_7213100928830799872_n.jpg?_nc_cat=106&_nc_sid=8ae9d6&_nc_eui2=AeHWKYmdMLbZjaOXZuGs-eS4D0Tz10hy7CAPRPPXSHLsIDftwoCaG0zGDF4xQrR4-Yw&_nc_ohc=f4ppFsxMW14AX94zd7i&_nc_ht=scontent.cdninstagram.com&oh=bdf4d988d590b0ce1268b3ff9c5d8c09&oe=5F0272A5

TimeoutError: Timeout awaiting 'request' for 30000ms

for every single post of that account and none of the posts are accessible by graphql.

Any idea why this is happening?

Thanks in advance

oorestisime commented 4 years ago

Sorry no idea :(

dbertella commented 4 years ago

Too much pain to go through fb registration and ig apis, Beta version worked for me. Hopefully we will find a solution to make it permanent. Also likes are not supported with the new scraping apparently, my build was failing because I included them in my query.

jpmarra commented 4 years ago

0.8.0-beta.0 has worked for me both locally and through Netlify. I'm going through the steps to authenticate now for a more long term fix.

zamson commented 4 years ago

I went through the steps again. here's what i found and let me know if this works for you and if it is clearer:

  1. You need to have a Facebook page (I know... :/)
  2. Go to your site settings -> Instagram -> Login into your Instagram account
  3. Create a app
  4. Go to the Graph API Explorer

    1. Make sure you are using v7 as api version
    2. Select your facebook app
    3. Click "Generate Access Token"
    4. Add the following permissions (pages_manage_ads, pages_manage_metadata, pages_read_engagement, pages_read_user_content, pages_show_list, instagram_basic)
    5. Make a GET request at me/accounts
    6. copy the access_token in the response (we call this temporary_token)
    7. click on the id to change the explorer url and append ?fields=instagram_business_account&access_token={access-token}
    8. save your instagram_business_account.id, this is your instagram_id
  5. Access Token Debugger:

    1. Paste your temporary_token and press "Debug"
    2. You should see this token now expires in 3 months
    3. Press "Extend Access Token" and press the new debug that appears next to the token
    4. You should see this token now never expires
    5. Copy this new token (we will call this access_token)

With these two information you can now use the plugin as:

{
  resolve: `gatsby-source-instagram`,
  options: {
    username: username,
    access_token: access_token,
    instagram_id: instagram_id,
  },
},

Regarding step 1. Do you mean a business account or my personal page? I'm trying to setup an fb app for a client but the instagram capabilities does not show up. The Instagram account is linked to their business account. Puh.. wish public scraping was working.

zamson commented 4 years ago

0.8.0-beta.0 has worked for me both locally and through Netlify. I'm going through the steps to authenticate now for a more long term fix.

Worked how, with public scraping?

MilosJo commented 4 years ago

I went through the steps again. here's what i found and let me know if this works for you and if it is clearer:

  1. You need to have a Facebook page (I know... :/)
  2. Go to your site settings -> Instagram -> Login into your Instagram account
  3. Create a app
  4. Go to the Graph API Explorer

    1. Make sure you are using v7 as api version
    2. Select your facebook app
    3. Click "Generate Access Token"
    4. Add the following permissions (pages_manage_ads, pages_manage_metadata, pages_read_engagement, pages_read_user_content, pages_show_list, instagram_basic)
    5. Make a GET request at me/accounts
    6. copy the access_token in the response (we call this temporary_token)
    7. click on the id to change the explorer url and append ?fields=instagram_business_account&access_token={access-token}
    8. save your instagram_business_account.id, this is your instagram_id
  5. Access Token Debugger:

    1. Paste your temporary_token and press "Debug"
    2. You should see this token now expires in 3 months
    3. Press "Extend Access Token" and press the new debug that appears next to the token
    4. You should see this token now never expires
    5. Copy this new token (we will call this access_token)

With these two information you can now use the plugin as:

{
  resolve: `gatsby-source-instagram`,
  options: {
    username: username,
    access_token: access_token,
    instagram_id: instagram_id,
  },
},

Regarding step 1. Do you mean a business account or my personal page? I'm trying to setup an fb app for a client but the instagram capabilities does not show up. The Instagram account is linked to their business account. Puh.. wish public scraping was working.

Following this steps my gatsby site is finally working and fetching instagram posts. Only thing I didn't manage to change is to extend 3months token expiration. Can confirm that these steps are legit.

Thanks @oorestisime

oorestisime commented 4 years ago

Ok folks, i ll wait another couple of days for some more feedback and then release 1.0 with the list of breaking changes and updated doc section! thank you all for your patience :)

jpmarra commented 4 years ago

0.8.0-beta.0 has worked for me both locally and through Netlify. I'm going through the steps to authenticate now for a more long term fix.

Worked how, with public scraping?

Sorry for the incomplete response. the beta release allowed for public scraping both locally and through Netlify.

I ended up creating an auth token by following the steps listed by @oorestisime, however I could only extend the access token three months, as stated by @MilosJo.

NeversSync commented 4 years ago

Having the same issue. Just attempted to go through the GraphApi setup (tedious and still not confident I did it correcty) and am getting this error both on development and deployment to Netlify.

Netlify build error:

Screen Shot 2020-06-08 at 6 37 38 PM

Here is my package, config and query: <StaticQuery query={graphql query myQuery { allInstaNode { edges { node { localFile { childImageSharp { fluid(maxHeight: 500, maxWidth: 500, quality: 90) { ...GatsbyImageSharpFluid_withWebp } } } } } } } }

Screen Shot 2020-06-08 at 6 39 24 PM

Config: { resolve:gatsby-source-instagram, options: { username:steadyhandtea, access_token: '{access token}', instagram_id: '{business id}, } },

Thanks for all the work on this! Hope it can all be resolved.

hzburki commented 4 years ago

The solution provided in this comment worked for public posts. Netlify build succeeded but I had to remove likes from the query.

oorestisime commented 4 years ago

https://github.com/oorestisime/gatsby-source-instagram/pull/158 fixed the likes issue. i ll release in 2 days

yone920 commented 4 years ago

Any good news on this problem?

mattortiz commented 4 years ago

Hi @oorestisime ,

I used the instructions provided above. As with prior comments, the instructions are difficult to follow and I'm not sure if I'm doing it correctly as a result. I did in the end get the instagram-id, temp-access-token and extended access-token (but only good through august 12, for some reason).

Still getting the following error (locally):
11:9 error Cannot query field "allInstaNode" on type "Query" graphql/template-strings

My gatsby.config settings with input from @NeversSync above: resolve: gatsby-source-instagram, options: { username: "envyforge", access_token: access-token, // extended access token from instructions instagram_id: instagram-id, // instagram id from instructions },

package.json setting: "gatsby-source-instagram": "0.8.0-beta.0",

Just providing input. I look forward to the fix and instructions.

Take care,

Matt

oorestisime commented 4 years ago

Can anyone try getting a permanent token using this https://github.com/Bnjis/Facebook-permanent-token-generator ? if this works then probably i could do this inside the plugin directly and spare people some manual steps.

other areas of research someone can help me with is whether there's a way to automate getting even a short lived token. The plugin doesn't really need a permanent token if we manage to automate the process of getting a short lived one because it will get a new token on each build.

In the meantime i ll handle release of 0.8.

once again sorry for the trouble this is causing

oorestisime commented 4 years ago

I just released 0.8 https://github.com/oorestisime/gatsby-source-instagram/releases/tag/v0.8.0

Make sure to change your instagram id.

Mike-Huggins commented 4 years ago

Sorry not sure I understand the thread here and the docs might not have been updated. My graphql looks like:


      allInstaNode(limit: 5) {
        nodes {
          id
          caption
          username
          localFile {
            childImageSharp {
              fluid(maxWidth: 200, maxHeight: 200, quality: 100) {
                ...GatsbyImageSharpFluid_withWebp
              }
            }
          }
        }
      }
    }

and my gatsby config:  {
      resolve: 'gatsby-source-instagram',
      options: {
        username: 'username',
      },
    },```

what do I need to do to get this working on netlify deploy please? 
oorestisime commented 4 years ago

i think this explains it good https://github.com/oorestisime/gatsby-source-instagram#public-scraping-for-posts need to pass the concerning username id. You can find it https://codeofaninja.com/tools/find-instagram-user-id if this is not clear feel free to PR something that would have helped you :)

Mike-Huggins commented 4 years ago

My apologies, this was entirely my own fault. Thanks for your effort to fix. I thought I had updated to 0.8 but I had not, for anyone following the thread then this is what my code now looks like with version 0.8 and adding in the user id as above:


  const data = useStaticQuery(graphql`
    query {
      allInstaNode(limit: 5, sort: { fields: [timestamp], order: DESC }) {
        edges {
          node {
            id
            caption
            localFile {
              childImageSharp {
                fluid(maxWidth: 200, maxHeight: 200, quality: 100) {
                  ...GatsbyImageSharpFluid_withWebp
                }
              }
            }
          }
        }
      }
    }
  `);

  return data.allInstaNode.edges.map(node => ({
    ...node.node.localFile.childImageSharp,
    id: node.node.id,
    caption: node.node.caption,
    username: node.node.username,
  }));
};```
mattortiz commented 4 years ago

@oorestisime , the 0.8.0 release and updated documentation got me freed up. Thank you very much!

Matt

andregmoeller commented 4 years ago

First off, thank you for all the work you put into developing and maintining this plugin! Probably it is too late now, but I feel that the configuration parameter / option 'username' should have been renamed to 'instagram_id' – the semantic meaning of the option username has changed. It is not any longer the username.

oorestisime commented 4 years ago

Yeah it crossed my mind right after i hit release :/ i ll do this in 1.0 which is as soon as i figure out how to properly get a token for the graph api!

maxsteenbergen commented 4 years ago

I'm happy to have found this thread! In my case, we only used the hashtag scraping which isn't fixed in 0.8.0 yet. Can we expect that to return some day or has instagram permanently broken that?

oorestisime commented 4 years ago

i haven't found any way to do it yet. i also haven't looked if the Graph Api can return this information.

Happy to receive help on that end :)

maxsteenbergen commented 4 years ago

I'd love to help out, even though my experience is limited. Any ideas as to why cloud builders fail whereas local builds work? (Note: it's not just Netlify, GitLab usually fails too but sporadically succeeds)

oorestisime commented 4 years ago

Instagram is adding a login screen . i suppose it is adding for when something is requesting it a lot without login but we can't know.

ngerbauld commented 4 years ago

Hi @oorestisime !

I am not sure what I am missing!

After updating the gatsby-source-instagram to version 0.8.0 and following the https://github.com/oorestisime/gatsby-source-instagram/issues/24#issuecomment-640183001 when I run on localhost I get: "The gatsby-source-instagram plugin has generated no Gatsby nodes", and when I check in graphql its not there "allInstaNodes".

Any idea what might be?

package.json ----> "gatsby-source-instagram": "^0.8.0"

gatsby.config ---> { resolve: gatsby-source-instagram, options: { username: 'instagram_username', access_token: 'access_token', instagram_id: 'id', }, }

Thank you for the help :)

maxsteenbergen commented 4 years ago

@ngerbauld Are you sure you're using the Instagram ID (all numbers) instead of the username? Confusing, I know, but it should be

{
  resolve: gatsby-source-instagram,
  options: {
      username: 1234567,
   }
}

with the instagram_id and access_token not needed, if I'm not mistaken

oorestisime commented 4 years ago

if you are using an access token then you don't need a username. its intentended to be as a fallback. now if you don't have a valid access token that would happen indeed. is this the first time using the plugin? or was it working and you just updated?

ngerbauld commented 4 years ago

I was using before on version 0.7.0 and it was working, but not anymore. On localhost was working great but when try to build in Netlify was giving errors (just like many people reported up in the conversation).

So I updated to version 0.8.0 but couldn't get the query anymore.

The access token that I got was only valid for 3 months, until August. But should work anyway, right?