oorestisime / gatsby-source-instagram

Create nodes from instagram posts hashtags and profiles
https://gatsby-src-instagram.netlify.com/
MIT License
149 stars 53 forks source link

Public scraping methods fail due to login screen on Instagram on production builds #24

Closed wjx0820 closed 1 year ago

wjx0820 commented 5 years ago

I use this plugin in my demo and could not work, it says could not fetch instagram posts, no Gatsby nodes generated(I did't use any token and just want to Public scraping for posts). So I cloned your repo, cd into /example, yarn install and run 'npm run develop'. And then it seems like the same problem happened. Wonder to know if i am missing something? Thanks!

oorestisime commented 5 years ago

This is odd. I pushed a new yarn lock file in the example. i forgot to do so. can you pull yarn install and try again?

wjx0820 commented 5 years ago

Thanks for the quick reply but it still not work... Wait a long time and said could not fetch instagram posts... Is it a temporary situation?

oorestisime commented 5 years ago

It works for me. I am not sure what is happening locally for you. Are you able to go on instagram? was it working for you and just broke or never worked?

wjx0820 commented 5 years ago

I can go to instagram in the browser. But still not work.

Error message like these: Could not fetch instagram posts. Error status Error: write EPROTO 4472038848:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:252:

warning The gatsby-source-instagram plugin has generated no Gatsby nodes. Do you need it?

My location is China, it must be the issue with GFW...I tried use proxy in the terminal, then npm run develop, but still wait for a long time then failed...😭

oorestisime commented 5 years ago

:( I am really sorry but i am not sure how i can help here :( You will need to try with a vpn or something.

calpa commented 5 years ago

@Jexxie Instagram maybe blocked in China according to the "law", so you may need to find other way.

This should not be an problem of this plugin.

oorestisime commented 5 years ago

Closing this! feel free to reopen if you think there's something to be done in the scope of the plugin!

tinoguti commented 4 years ago

I am having trouble with this same description when I try to build my Gatsby app on AWS but when I try locally somehow it works. I thought I solved the problem by updating npm packages and I had 2 successful builds but today it is not building. Any help on how to solve this problem? Here's a copy of my log with the part where everything seems to start failing:

2020-05-10T16:23:33.187Z [INFO]: success createSchemaCustomization - 0.073s
2020-05-10T16:23:33.560Z [WARNING]: warning
                                    Could not fetch instagram posts. Error status TypeError: Cannot read property '0' of undefined
2020-05-10T16:23:33.708Z [WARNING]: warning The gatsby-source-instagram plugin has generated no Gatsby nodes. Do you need it?
2020-05-10T16:23:33.708Z [INFO]: success source and transform nodes - 0.521s
2020-05-10T16:23:33.966Z [INFO]: success building schema - 0.257s
2020-05-10T16:23:34.017Z [INFO]: success createPages - 0.049s
2020-05-10T16:23:34.082Z [INFO]: success createPagesStatefully - 0.065s
2020-05-10T16:23:34.082Z [INFO]: success onPreExtractQueries - 0.000s
2020-05-10T16:23:34.111Z [INFO]: success update schema - 0.028s
2020-05-10T16:23:34.446Z [WARNING]: error There was an error in your GraphQL query:
                                    Cannot query field "allInstaNode" on type "Query".
                                    If you don't expect "allInstaNode" to exist on the type "Query" it is most likely a typo.
                                    However, if you expect "allInstaNode" to exist there are a couple of solutions to common problems:
                                    - If you added a new data source and/or changed something inside gatsby-node.js/gatsby-config.js, please try a restart of your development server
                                    - The field might be accessible in another subfield, please try your query in GraphiQL and use the GraphiQL explorer to see which fields you can query and what shape they have
                                    - You want to optionally use your field "allInstaNode" and right now it is not used anywhere. Therefore Gatsby can't infer the type and add it to the GraphQL schema. A quick fix is to add a least one entry with that field ("dummy content")
                                    It is recommended to explicitly type your GraphQL schema if you want to use optional fields. This way you don't have to add the mentioned "dummy content". Visit our docs to learn how you can define the schema for "Query":
                                    https://www.gatsbyjs.org/docs/schema-customization/#creating-type-definitions
2020-05-10T16:23:34.449Z [INFO]: failed extract queries from components - 0.337s
2020-05-10T16:23:34.514Z [WARNING]: npm
2020-05-10T16:23:34.515Z [WARNING]: ERR! code ELIFECYCLE
                                    npm ERR! errno 1
2020-05-10T16:23:34.515Z [WARNING]: npm
tinoguti commented 4 years ago

Now I am intrigued. I've just redeployed my app and it worked. Any idea why this seems to randomly fail? I wouldn't like to have a few failed builds hoping the next one will be the one every time I need to deploy.

oorestisime commented 4 years ago

Hey there, kind of hard to know without reproduction :/

Is it with public scraping or with the graph api? Where does this run? amplify?

tinoguti commented 4 years ago

Hi. Yes it's running on amplify and with public scrapping, with just an username added to the plugin config parameters.

GraphQL query looks like this: ` allInstaNode(limit: 8) {

  edges {
      node {
          id
          username
          caption
          localFile {
            childImageSharp {
              fixed(width: 500, height: 500) {
                ...GatsbyImageSharpFixed
              }
            }
          }
      }
  }
}

` It works well with gatsby devleop and gatsby build locally. But having "random" build fails on amplify.

oorestisime commented 4 years ago

Yeah query isn't the issue. the error you showed above is when it couldn't get the instagram posts. maybe something wrong with instagram during that particular time? does it still happening?

I just triggered another rebuild on netlify for the example app and seems to be working fine

LarsBehrenberg commented 4 years ago

I seem to have the same issue. Just a week ago it seemed to have worked fine. I am deploying my site with Netlify. With the gatsby develop or build command locally I don't run into issues and even with the NetlifyCLI running the local netlify build command everything is alright. But as soon as I push to Netlify the build fails.

7:18:17 PM: $ yarn build
7:18:17 PM: yarn run v1.22.4
7:18:17 PM: $ gatsby clean && gatsby build
7:18:18 PM: 
7:18:18 PM: info Deleting .cache, public
7:18:18 PM: info Successfully deleted directories
7:18:20 PM: 
7:18:20 PM: success open and validate gatsby-configs - 0.067s
7:18:22 PM: 
7:18:22 PM: success load plugins - 1.694s
7:18:22 PM: 
7:18:22 PM: success onPreInit - 0.016s
7:18:22 PM: success delete html and css files from previous builds - 0.018s
7:18:22 PM: 
7:18:22 PM: success initialize cache - 0.012s
7:18:22 PM: 
7:18:22 PM: success copy gatsby files - 0.047s
7:18:22 PM: 
7:18:22 PM: success onPreBootstrap - 0.010s
7:18:22 PM: 
7:18:22 PM: success createSchemaCustomization - 0.012s
7:18:23 PM: 
7:18:23 PM: warning
7:18:23 PM: Could not fetch instagram posts. Error status TypeError: Cannot read property '0' of undefined
7:18:23 PM: success source and transform nodes - 1.476s
7:18:24 PM: 
7:18:24 PM: success building schema - 0.578s
7:18:24 PM: 
7:18:24 PM: success createPages - 0.240s
7:18:24 PM: success createPagesStatefully - 0.162s
7:18:24 PM: 
7:18:24 PM: success onPreExtractQueries - 0.001s
7:18:24 PM: success update schema - 0.084s
7:18:25 PM: error There was an error in your GraphQL query:
7:18:25 PM: Cannot query field "allInstaNode" on type "Query".
7:18:25 PM: If you don't expect "allInstaNode" to exist on the type "Query" it is most likely a typo.
7:18:25 PM: However, if you expect "allInstaNode" to exist there are a couple of solutions to common problems:
7:18:25 PM: - If you added a new data source and/or changed something inside gatsby-node.js/gatsby-config.js, please try a restart of your development server
7:18:25 PM: - The field might be accessible in another subfield, please try your query in GraphiQL and use the GraphiQL explorer to see which fields you can query and what shape they have
7:18:25 PM: - You want to optionally use your field "allInstaNode" and right now it is not used anywhere. Therefore Gatsby can't infer the type and add it to the GraphQL schema. A quick fix is to add a least one entry with that field ("dummy content")
7:18:25 PM: It is recommended to explicitly type your GraphQL schema if you want to use optional fields. This way you don't have to add the mentioned "dummy content". Visit our docs to learn how you can define the schema for "Query":
7:18:25 PM: https://www.gatsbyjs.org/docs/schema-customization/#creating-type-definitions
7:18:25 PM: not finished Generating image thumbnails - 1.265s
7:18:25 PM: failed extract queries from components - 0.823s
oorestisime commented 4 years ago

I am seeing this now on netlify as well. locally works fine. I am investigating right now

oorestisime commented 4 years ago

Well now it went through on netlify. I think maybe network issue? I can't find any way to reproduce locally so not sure what i can do here :/

LarsBehrenberg commented 4 years ago

I am really not sure what is happening here, but it seems like this is not the plugins fault? I deployed the same built a couple times on Netlify every time clearing cache before building. The first two times everything works fine and then the 3rd time it breaks.

warning
11:15:23 AM: Could not fetch instagram user. Error status TypeError: Cannot read property '0' of undefined
11:15:23 AM: 
11:15:23 AM: error "gatsby-source-instagram" threw an error while running the sourceNodes lifecycle:
11:15:23 AM: Cannot read property 'id' of null
11:15:23 AM:   74 |   return {
11:15:23 AM:   75 |     type: params.type,
11:15:23 AM: > 76 |     id: datum.id,
11:15:23 AM:      |               ^
11:15:23 AM:   77 |     full_name: datum.full_name,
11:15:23 AM:   78 |     biography: datum.biography,
11:15:23 AM:   79 |     edge_followed_by: datum.edge_followed_by,
11:15:23 AM: 
11:15:23 AM: 
11:15:23 AM: 
11:15:23 AM:   TypeError: Cannot read property 'id' of null
11:15:23 AM:   
11:15:23 AM:   - gatsby-node.js:76 createUserNode
11:15:23 AM:     [repo]/[gatsby-source-instagram]/gatsby-node.js:76:15
11:15:23 AM:   
11:15:23 AM:   - gatsby-node.js:91 processDatum
11:15:23 AM:     [repo]/[gatsby-source-instagram]/gatsby-node.js:91:49
11:15:23 AM:   
11:15:23 AM:   - gatsby-node.js:125 
11:15:23 AM:     [repo]/[gatsby-source-instagram]/gatsby-node.js:125:16
11:15:23 AM:   
11:15:23 AM:   - Array.map
11:15:23 AM:   
11:15:23 AM:   - gatsby-node.js:123 Object.exports.sourceNodes
11:15:23 AM:     [repo]/[gatsby-source-instagram]/gatsby-node.js:123:29
11:15:23 AM:   
11:15:23 AM:   - task_queues.js:97 processTicksAndRejections
11:15:23 AM:     internal/process/task_queues.js:97:5
11:15:23 AM:   
11:15:23 AM: 
11:15:23 AM: not finished source and transform nodes - 1.120s

Btw, I am not using an Instagram API, maybe this makes a difference?

pul87 commented 4 years ago

I'm having the same problem on Netlify, seems an issue related to the platform, on development it works.

LarsBehrenberg commented 4 years ago

Anything we can do about it? Some workaround?

oorestisime commented 4 years ago

Well the issue is not with the plugin afaict. it is happening when scraping the page doesn't work. But i can't find out whether this is an issue on Netlify (much more likely since it never happens in dev) itself or if Instagram is testing things out. After all the public scraping is extemely dependent on their raw html code :)

Only thing i can susggest if use the API. it worked the whole time i was testing.

Has anybody tried to contact Netlify support to see if anything is off in the network?

oorestisime commented 4 years ago

I am re-opening this for visibility among folks checking the issues.

dmcreis commented 4 years ago

Same problem happening here. I tried a couple of times today and the build is failing. is there any workaround for this? @oorestisime thanks for all your replies :)

tonilaukka commented 4 years ago

I'm having the same issue although my builds fail randomly and hitting redeploy helps.

This is the error message:

11:55:31 PM: error There was an error in your GraphQL query:
11:55:31 PM: - Unknown field 'allInstaNode' on type 'Query'.
11:55:31 PM: failed extract queries from components - 0.464s
LarsBehrenberg commented 4 years ago

@oorestisime thanks for reopening this issue!

I have posted this on the Netlify community as well. https://community.netlify.com/t/deploying-a-gatsby-site-on-netlify-using-the-gatsby-source-instagram-plugin-failing-on-build-time-issue-with-netlify-fetching-data-from-instagram-cannot-reproduce-locally/15957

xanderjl commented 4 years ago

I've also been running into this issue. Noticed a string of failed builds on the deploy dashboard with the same error. I hope your Netlify post brings attention to the problem!

LarsBehrenberg commented 4 years ago

If you could join the netlify post and let people know you have had the same issue, maybe it will gain more attention. Would really appreciate it! Thanks so much here already for all the help!

ashishterp commented 4 years ago

I'm having the same issue on Netlify as of last week. I forked a copy of the plugin and added some logging and it looks like Instagram is returning a login page for some reason:

2:11:52 AM: { LoginAndSignupPage:
2:11:52 AM:    [ { captcha: [Object],
2:11:52 AM:        gdpr_required: false,
2:11:52 AM:        tos_version: 'row',
2:11:52 AM:        username_hint: '' } ] }

I tried faking the User Agent on the axios request and that didn't work either. What's weird is that it works fine locally, so maybe it is something on the IG side about where the request is coming from that is triggering it generating a login page.

oorestisime commented 4 years ago

So from #152 i am reposting the same link here https://stackoverflow.com/questions/57624387/dont-have-profilepage-index-but-i-have-loginandsignuppage The issue is Instagram over the last week or so has restricted their unlogged-in (guest) access (based on IP address).

This is why we have a login now.

oorestisime commented 4 years ago

I am sorry this is failing for you folks, i ll try over the next days to work something out using this https://github.com/oorestisime/gatsby-source-instagram/issues/131#issuecomment-603282394 hopefully that won't get a login screen. I don't have much time though in the next 2 days so if anyone wants to come up with a PR, more than welcome :)

tinoguti commented 4 years ago

I would just like to add that I am also having this issue on my Amazon Amplify app. I went through 8 failed builds to get one right.

oorestisime commented 4 years ago

Sorry for the trouble this is causing, I made time for working on this tonight. give me a few hours please :)

johnkavanagh commented 4 years ago

@oorestisime

Sorry for the trouble this is causing, I made time for working on this tonight. give me a few hours please :)

Just in case nobody else says it: thank you! I know you're not in any way obligated to fix everybody else's broken builds. Your plugin is awesome (when it's working), thank you for looking at it again for us now.

LarsBehrenberg commented 4 years ago

@oorestisime Yes, have to agree with @johnkavanagh! Thanks so much for the plugin in the first place and all the effort to keep it updated!

oorestisime commented 4 years ago

Hey folks thanks for the kind words. I released 0.7.1 with the new endpoint modification but it seems problem persist. If any of you could test as well to confirm this 🙏

Not sure what next steps are if this persists :(

xanderjl commented 4 years ago

Can confirm the issue is still persisting on 7.1

10:41:42 AM: error There was an error in your GraphQL query:
10:41:42 AM: Cannot query field "allInstaNode" on type "Query".
10:41:42 AM: If you don't expect "allInstaNode" to exist on the type "Query" it is most likely a typo.
10:41:42 AM: However, if you expect "allInstaNode" to exist there are a couple of solutions to common problems:
10:41:42 AM: - If you added a new data source and/or changed something inside gatsby-node.js/gatsby-config.js, please try a restart of your development server
10:41:42 AM: - The field might be accessible in another subfield, please try your query in GraphiQL and use the GraphiQL explorer to see which fields you can query and what shape they have
10:41:42 AM: - You want to optionally use your field "allInstaNode" and right now it is not used anywhere. Therefore Gatsby can't infer the type and add it to the GraphQL schema. A quick fix is to add a least one entry with that field ("dummy content")
10:41:42 AM: It is recommended to explicitly type your GraphQL schema if you want to use optional fields. This way you don't have to add the mentioned "dummy content". Visit our docs to learn how you can define the schema for "Query":
10:41:42 AM: https://www.gatsbyjs.org/docs/schema-customization/#creating-type-definitions
10:41:42 AM: failed extract queries from components - 1.171s

Thank you for your quick response! Appreciate all of the work you've put into this :)

oorestisime commented 4 years ago

I am trying to think of next steps here:

marcusps11 commented 4 years ago

Same problem here can not produce locally only when i try and build - at least i know i am not going mad.

cbaucom commented 4 years ago

I can also confirm this is an issue for me when trying to build on Netlify. Failed when I was at version 0.5.0 and also after updating to 0.7.1.

12:33:39 AM: Could not fetch instagram posts. Error status TypeError: Cannot read property 'user' of undefined
12:33:43 AM: warning The gatsby-source-instagram plugin has generated no Gatsby nodes. Do you need it?
12:33:43 AM: success source and transform nodes — 4.429 s
12:33:43 AM: success building schema — 0.400 s
12:33:43 AM: success createPages — 0.027 s
12:33:43 AM: success createPagesStatefully — 0.043 s
12:33:43 AM: success onPreExtractQueries — 0.002 s
12:33:43 AM: success update schema — 0.024 s
12:33:44 AM: error GraphQL Error Encountered 1 error(s):
12:33:44 AM: - Unknown field 'allInstaNode' on type 'Query'.
12:33:44 AM:       file: /opt/build/repo/src/components/Instagram.js

Does not fail for me locally on either develop or build

wjchat commented 4 years ago

Does your plugin use Instagram's newer Graph API or the legacy API? BTW, it is a really great plugin. I'm having similar issues with my app deployed on AWS amplify. Reading about how the legacy api is being disabled by Instagram. maybe this has something to do with it?

oorestisime commented 4 years ago

The api uses the newer Graph api and afaiu it is not affected.

oorestisime commented 4 years ago

I logged the output, and its a plain login wall. I can't find anything usefull there. Not sure what i can do next folks. seems like i need to remove public scraping methods.

I d love your input here.

DevanB commented 4 years ago

I've been trying to follow along with the instructions to setup an access token, but they seem outdated and/or not thorough enough to follow. If you do remove public scraping, perhaps someone can give a good walk through of setting up and getting an access token with screenshots and/or videos.

oorestisime commented 4 years ago

I am happy to accept any PRs on that end. It's been a year i did those steps myself and i remember it was quite tedious.

luchoster commented 4 years ago

I've been thinking about making a port from this plugin, to a React component that we can just pull data on page load. Finally last night I decided to start it, and today I encounter this problem with one of the sites I maintain.

Anyway, long story short I published: react-ig - It displays a grid of the posts, at the moment I only had the hashtag working, but I just added the username prop:

<InstagramPosts username="vegas" />

And you'll get the latest 12 posts, this will hit the ig user url, on every page load, so you'll get fresh posts every time. Or use the hashtag, as it says on the readme.

I'm only posting it here, in case anyone needs a quick fix on their build and this could help.

luchoster commented 4 years ago

@luchoster Thanks for this, is the UI configurable?

Not really, but if you want to create an issue https://github.com/luchoster/react-ig/issues I'll see what I can do.

diogocapela commented 4 years ago

I'm having the same problem. The development and build instances both work locally but they fail when deployed on to Netlify. Here is the Netlify deploy error log:

1:13:09 AM: error There was an error in your GraphQL query:
1:13:09 AM: Cannot query field "allInstaNode" on type "Query".
1:13:09 AM: If you don't expect "allInstaNode" to exist on the type "Query" it is most likely a typo.
1:13:09 AM: However, if you expect "allInstaNode" to exist there are a couple of solutions to common problems:
1:13:09 AM: - If you added a new data source and/or changed something inside gatsby-node.js/gatsby-config.js, please try a restart of your development server
1:13:09 AM: - The field might be accessible in another subfield, please try your query in GraphiQL and use the GraphiQL explorer to see which fields you can query and what shape they have
1:13:09 AM: - You want to optionally use your field "allInstaNode" and right now it is not used anywhere. Therefore Gatsby can't infer the type and add it to the GraphQL schema. A quick fix is to add a least one entry with that field ("dummy content")
1:13:09 AM: It is recommended to explicitly type your GraphQL schema if you want to use optional fields. This way you don't have to add the mentioned "dummy content". Visit our docs to learn how you can define the schema for "Query":
1:13:09 AM: https://www.gatsbyjs.org/docs/schema-customization/#creating-type-definitions
1:13:09 AM: failed extract queries from components - 0.220s
1:13:09 AM: npm
1:13:09 AM:  ERR! code ELIFECYCLE
1:13:09 AM: npm ERR!
1:13:09 AM:  errno 1
1:13:09 AM: npm ERR! jam-stack-boiler@0.0.1 build: `gatsby build`
1:13:09 AM: npm ERR! Exit status 1
1:13:09 AM: npm ERR!
1:13:09 AM: npm ERR! Failed at the jam-stack-boiler@0.0.1 build script.
1:13:09 AM: npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
1:13:09 AM: npm
1:13:09 AM:  ERR! A complete log of this run can be found in:
1:13:09 AM: npm ERR!     /opt/buildhome/.npm/_logs/2020-06-05T00_13_09_340Z-debug.log
zamson commented 4 years ago

Same problem here. So is it only public fetching that is not working?

Public scraping fails locally and on Netlify build

Using gatsby-source-instagram@0.7.2

oorestisime commented 4 years ago

Yes public scraping fails due to their login screen but the graph api works.

if you are having issues generating a token then this might help you https://github.com/oorestisime/gatsby-source-instagram/issues/156 cc @DevanB

I ll wait a few more days in case something else on that doc section needs modification before i go and update it.

tinoguti commented 4 years ago

Could someone sum up the current status up to this point? I will need to do a build of my app sometime soon but it seems like the build won't work. What's the alternative? Thank you

sjelfull commented 4 years ago

@tinoguti The only alternative is doing authorized calls to the Graph API.

I'm in the middle of generating a access token, and its confusing as hell, especially as the instructions doesn't match what i'm seeing.

oorestisime commented 4 years ago

I know of one inconsistency on the docs already which is the permissions because last month they changed. https://developers.facebook.com/docs/facebook-login/permissions/#reference-manage_pages

what else is missing? if you folks don't let me know i can't actually update them :/