stefansundin / rssbox

:newspaper: I consume the world via RSS feeds, and this is my attempt to keep it that way.
https://github.com/stefansundin/rssbox/discussions/64
GNU Affero General Public License v3.0
775 stars 73 forks source link

Instagram discussion issue #39

Closed eyoungmin closed 6 months ago

eyoungmin commented 4 years ago

Please fix it.

stefansundin commented 4 years ago

It's not fixable. They are rate limiting much harder now. If everyone would just stop hammering my free service everyone would just have a better experience.

The only good fix is to host your own RSS Box. It's not that hard, give it a try.

eyoungmin commented 4 years ago

It's not fixable. They are rate limiting much harder now. If everyone would just stop hammering my free service everyone would just have a better experience.

The only good fix is to host your own RSS Box. It's not that hard, give it a try.

I'm hosting it myself. It has the same problem.

stefansundin commented 4 years ago

Try adding a sessionid as described here: https://github.com/stefansundin/rssbox/issues/21#issuecomment-525130553

Maybe I should just disable all requests without one.

eyoungmin commented 4 years ago

thanks!

meyerjom commented 4 years ago

Same problem here with my own heroku hosted env. Did not try the sessionid thing yet.

Ex-Ark commented 4 years ago

I'm also hosting it privately on a dedicated server and getting :

"There was a problem talking to Instagram. Please try again in a moment."

Reading instagram.rb I suspect it has nothing to do with the rate limit , as you have defined custom rate limit error, and this is not the error raised in the output (InstagramError vs InstagramRatelimitError)

Here's the actual instagram response in my case

2020-09-04 15:39:43 - InstagramError - 302: https://www.instagram.com/accounts/login/?next=/web/search/topsearch/%3F__a%3D1%26query%3

Is Instagram refusing all request to anonymous users now ? Why is it redirecting to login page ? To be clear I tried it with a public page URI, I have no problem while using a private browser with no cookie whatsoever

I haven't tried your solution with the sessionid because I think it's not "production ready " to put personal cookie into a public facing webapp. What if I don't want to create an Instagram account at all ?

PS: The log is truncated as defined in your http.rb:138 : truncate the message in order to cut down on log filesize

If you're interested in reducing the logfile you could look into "system level" solutions like logrotate, in order to keep the integrity of data instead.

patrickdrd commented 4 years ago

Try adding a sessionid as described here: #21 (comment)

I'm using sessionid but instagram is complaining about suspicious access the last week, I was forced to change my password 3 times in a week already! is rssbox the culprit? did anyone else have similar issues?

stefansundin commented 4 years ago

Is Instagram refusing all request to anonymous users now ? Why is it redirecting to login page ?

I don't know why they sometimes return 429 and sometimes a redirect to the login page. I suspect there may be two different rate limit protections, perhaps one to just get people to login on the application layer, and then another one at an infrastructure layer that based on IP address that returns 429. This is just a guess though.

I haven't tried your solution with the sessionid because I think it's not "production ready " to put personal cookie into a public facing webapp. What if I don't want to create an Instagram account at all ?

None of this is sanctioned by Instagram so don't ever expect the Instagram feature to be "production ready". lol. And use the sessionid hack at your own risk.

If you're interested in reducing the logfile you could look into "system level" solutions like logrotate, in order to keep the integrity of data instead.

Not possible on Heroku. That's what I optimize the app for by default. You can change things like log truncation on your own.

I'm using sessionid but instagram is complaining about suspicious access the last week, I was forced to change my password 3 times in a week already! is rssbox the culprit? did anyone else have similar issues?

Probably. Use at your own risk.

eyoungmin commented 4 years ago

IFTTT Activity Said:

Applet failed Feedjira::UserFacingFetchFailure

Show details: No details available.

muava12 commented 4 years ago

Any fix for this issue?

stefansundin commented 3 years ago

So I recently implemented some pretty good caching functionality, which has greatly stabilized RSS Box. The app is really responsive now and I am really happy with how well it has restored usability of the website. However, on its own it is not enough to make Instagram work well.

Currently you can only configure one "Instagram sessionid" by setting INSTAGRAM_SESSIONID. And if you configure it and do enough requests quickly enough then Instagram will say that they have detected "unusual activity" from your account and you will have to perform some manual steps to regain access.

Screen Shot 2021-01-14 at 13 15 53

Screen Shot 2021-01-14 at 13 16 10

I have not added a real phone number yet so I don't know if you can continue making requests afterwards. Please comment here with your experience if you try this. Right now rssbox.herokuapp.com is running with INSTAGRAM_SESSIONID unset.

If I could add code that tracks the number of requests that are being sent to Instagram, then we could try to find a request rate that is safe to use without locking out the account. Then I can add some code that allows a lot of session ids to be added (probably using redis?), and the application can "load balance" between the ids to avoid using one too much. It would take some time to implement though.

I would need help from volunteers to help me register a lot of accounts so we can have a collection of instagram session ids. I don't know how many ids we'd need, but the more the merrier. And it's not possible to sign up multiple accounts using email tricks such as using + at the end of the email, or multiple dots. But maybe there's a trick to work around this that someone knows about?

Anyway, we can continue using this issue to discuss the issue. Let's focus on trying to find a solution. (Any unhelpful comment saying "please fix" or similar will be removed.)

ALERTua commented 3 years ago

@stefansundin, after I confirmed with my phone number and got back into Instagram, then refreshed Instagram session ID (I don't remember whether it has changed), RSSBox started working.

stefansundin commented 3 years ago

@ALERTua That's great to hear. How many feeds are you subscribed to using that session id? Let us know if it stops working again, or if you have to take any more actions to verify your account.

ALERTua commented 3 years ago

@stefansundin ~100 feeds. 12 hours between updates are pulled by my TinyTinyRSS. okay, I will let you know here if anything goes wrong or Instagram wants any verification from me.

TalD commented 3 years ago

@ALERTua How did you get this working? I inputted the INSTAGRAM_SESSIONID, yet I am still getting hit with the rate limiting message in the UI.

ALERTua commented 3 years ago

@TalD

ALERTua commented 3 years ago

update: lowered the refresh rate for Instagram feeds to 4 hours and got my account locked :D image Instagram demanded a password change and everything got working again. Raised refresh rate back to 12 hours.

ALERTua commented 3 years ago

got suspended again. this time Instagram demanded phone confirmation and captcha and gave me 30 days to reactivate before banning. image raised feeds update interval to 24 hours

TalD commented 3 years ago

I've given up on the Instagram feeds. The rate limiting / getting locked out is too unpredictable and annoying. FYI, there is a Chrome plugin called Feedbro. It's a bit manual, but after you input / organize your feeds, it works like any RSS feed. Only downside is you need to be on your desktop.

eyoungmin commented 3 years ago

My account is locked. And the Instagram mobile only unlock said that it is possible and said, shows a 360-degree, and my face. It's terrible.

MiTereKun commented 3 years ago

why not just remove instagram from rssbox.herokuapp.com if you can't fix it? Very sorry. But out of desire to watch a couple of people, in no case do you want to create an account on this garbage dump

ALERTua commented 3 years ago

I'm still using my rssbox instance to get updates for ~200 Instagram accounts. I don't get many Instagram account verification requests anymore. I guess I proved enough that I'm a real person :D The session id changes after almost every verification, but I'm ok with this. Please leave the Instagram support for rssbox working as it is. Yes, there are service limitations, but it is understandable that Instagram doesn't want crawlers to get data, but a real person watching their advertisement.

albertvaka commented 3 years ago

I just deployed my own rssbox and (without a sessionid) I'm getting the rate-limited error right away, which is weird.

ALERTua commented 2 years ago

something broke. rssbox responds correctly with the URL of the feed, but the feed is "Something went wrong. Try again later." Logs:

192.168.1.1 - - [03/Feb/2022:10:42:25 +0200] "GET /instagram?q=username HTTP/1.1" 200 - 2.0286
2022-02-03 10:42:26 - NoMethodError - undefined method `[]' for nil:NilClass:
        /app/app/instagram.rb:40:in `block in get_post'
        /app/lib/cache.rb:90:in `cache'
        /app/app/instagram.rb:36:in `get_post'
        /app/app.rb:626:in `block (3 levels) in <top (required)>'
        /app/app.rb:624:in `map'
        /app/app.rb:624:in `block (2 levels) in <top (required)>'
        /app/lib/cache.rb:90:in `cache'
        /app/app.rb:614:in `block in <top (required)>'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1674:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1674:in `block in compile!'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1013:in `block (3 levels) in route!'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1032:in `route_eval'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1013:in `block (2 levels) in route!'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1061:in `block in process_route'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1059:in `catch'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1059:in `process_route'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1011:in `block in route!'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1008:in `each'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1008:in `route!'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1129:in `block in dispatch!'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `block in invoke'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `catch'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `invoke'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1124:in `dispatch!'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:939:in `block in call!'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `block in invoke'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `catch'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1101:in `invoke'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:939:in `call!'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:929:in `call'
        /app/.bundle/gems/ruby/3.0.0/bundler/gems/prometheus-client-fc179858e6e0/lib/prometheus/middleware/exporter.rb:32:in `call'
        /app/lib/middleware.rb:10:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/secure_headers-6.3.3/lib/secure_headers/middleware.rb:11:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-ssl-enforcer-0.2.9/lib/rack/ssl-enforcer.rb:52:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-2.2.3/lib/rack/deflater.rb:44:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-protection-2.1.0/lib/rack/protection/xss_header.rb:18:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-protection-2.1.0/lib/rack/protection/path_traversal.rb:16:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-protection-2.1.0/lib/rack/protection/json_csrf.rb:26:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-protection-2.1.0/lib/rack/protection/base.rb:50:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-protection-2.1.0/lib/rack/protection/base.rb:50:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-2.2.3/lib/rack/logger.rb:17:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-2.2.3/lib/rack/common_logger.rb:38:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:253:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:246:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-2.2.3/lib/rack/head.rb:12:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/rack-2.2.3/lib/rack/method_override.rb:24:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:216:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/sinatra-2.1.0/lib/sinatra/base.rb:1991:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/puma-5.5.2/lib/puma/configuration.rb:249:in `call'
        /app/.bundle/gems/ruby/3.0.0/gems/puma-5.5.2/lib/puma/request.rb:77:in `block in handle_request'
        /app/.bundle/gems/ruby/3.0.0/gems/puma-5.5.2/lib/puma/thread_pool.rb:340:in `with_force_shutdown'
        /app/.bundle/gems/ruby/3.0.0/gems/puma-5.5.2/lib/puma/request.rb:76:in `handle_request'
        /app/.bundle/gems/ruby/3.0.0/gems/puma-5.5.2/lib/puma/server.rb:447:in `process_client'
        /app/.bundle/gems/ruby/3.0.0/gems/puma-5.5.2/lib/puma/thread_pool.rb:147:in `block in spawn_thread'
seadowg commented 2 years ago

@ALERTua seeing the same here

seadowg commented 2 years ago

something broke. rssbox responds correctly with the URL of the feed, but the feed is "Something went wrong. Try again later."

Digging into this bit with a local checkout of the code it looks like Instagram's API might have changed. It seems the posts I'm getting back do not have the structure that RSS Box is expecting - RSS Box is looking for a top level "graphql" key in the JSON but what I'm getting for posts has this structure at the top:

{
  "items": [...]
  "num_results": 1,
  "more_available": false,
  "auto_load_more_enabled": false
}

I'm not familiar enough with the Instagram API to know if this is a known or a new response format. Can probably come up and dig a little deeper soon.

stefansundin commented 2 years ago

Hey, sorry everyone that I didn't take a closer look earlier. Looks like this problem only happened when you were using a sessionid (rssbox.herokoapp.com currently isn't).

I just pushed 7e0e92ac582d265f2356d5bb34fa1b422c32fe2f which I believe should fix the problem. A new docker image has also been pushed out.

Let me know if that works for ya. Happy valentine's day (there's 22 minutes left for me!).

ALERTua commented 2 years ago

yep. this seems to have worked. thank you <3 although, after the bulk update started, Instagram started returning 502 after the 20th feed :)

seadowg commented 2 years ago

@stefansundin thanks! This is working again for me.

seadowg commented 2 years ago

I've been running into a 429 errors pretty frequently recently. I'm following 10s of different accounts, but restarting my dyno (on Heroku) seems to be the culprit as the file cache will blown away, and RSS Box will then hit Instagram for all my accounts when I open my feed reader.

I'd be interested in experimenting with adding a throttling mechanic to prevent the requests happing in quick succession. Personally, I don't mind if things are a little out of date, so would be happy to throttle to one request per minute or potentially something even slower. Maybe this would end up being an "opt in" feature. I had a couple of questions about caching however to get my up to speed (most likely for @stefansundin):

  1. I was surprised that Redis wasn't being used for caching (I'd assumed that's what it was there for, but it looks like caching is file based). What is Redis used for?
  2. Looking at this line, it seems the intention is for Instagram feeds to be cached for a week (7*24*60*60). Is that correct? I feel like I've seen updates come in faster than that, so that surprised me.
jutsh65 commented 2 years ago

I am also having ratelimiting issues. I follow a handful of instagram accounts and all started getting the ratelimiting at the end of last month (I just noticed as they are not frequent posting accounts). I self host and last restarted rssbox about two weeks before then (mid-June). The accounts are polled very infrequently, so I am surprised I have an issue. The polling rate varies between every 6 hours to just once a day. It can happen that multiple instagram accounts are polled at the same time though. However I just ran a test with a single feed / account after restarting rssbox and it immediately came back as ratelimited. The sessionid is still valid and I can browse to the account pages just fine. I wonder if my IP address has been blacklisted? My other rssbox provided feeds from twitch and twitter are working great. Is there a way to turn on detailed logging for just the instagram service? I don't really see logging in the code, but it might be helpful to see the actual request and responses.

UPDATE: Ah, so I removed the session id from the .env file and restarted. This time the test instagram account did update, but the log contains a traceback and the error: "JSON::ParserError - 783: unexpected token at "... I've attached the output as a file. Here is the top of the trace:

/usr/lib/ruby/2.7.0/json/common.rb:156:in `parse'
/usr/lib/ruby/2.7.0/json/common.rb:156:in `parse'
/home/xxx/rssbox/app/http.rb:72:in `json'
/home/xxx/rssbox/app/services/instagram.rb:39:in `block in get_post'
/home/xxx/rssbox/app/cache.rb:90:in `cache'
/home/xxx/rssbox/app/services/instagram.rb:36:in `get_post'
app.rb:639:in `block (3 levels) in <main>'
app.rb:637:in `map'
app.rb:637:in `block (2 levels) in <main>'
/home/xxx/rssbox/app/cache.rb:90:in `cache'
app.rb:627:in `block in <main>'

inst_prob.txt

ALERTua commented 2 years ago

so, did anyone try to use Instagram Graph API to form RSS? Yes, it forces to have an Instagram Creator account, but this is manageable. https://developers.facebook.com/docs/instagram-basic-display-api/reference/user/media#reading

stefansundin commented 2 years ago

so, did anyone try to use Instagram Graph API to form RSS?

RSS Box used to fetch the information using their API, but in 2016 they closed down the public API and decided to make it very hard to get access to it.

Yes, it forces to have an Instagram Creator account, but this is manageable.

It used to be essentially impossible to get API access (I tried once). If that has changed and it is now easier then please try to get API access. If the success rate is high enough then I might bring back some kind of official support here. But my guess is that they won't give you access.

For reference, here's the commit that removed the use of their old official API: https://github.com/stefansundin/rssbox/commit/304dc41394305ac4e99fec0a66ad863d88c974cc

ALERTua commented 2 years ago

Yeah, they tried to make getting the API access impossible. No luck for me. It just disables API Explorer when I log in while having an Instagram Basic Display API App active.

faveoled commented 1 year ago

Hi. Is there anything I can do:

172.17.0.1 - - [23/Feb/2023:18:10:42 +0000] "GET /instagram?q=https%3A%2F%2Fwww.instagram.com%2Felonmusk HTTP/1.1" 422 - 0.0063

It says Something went wrong. Try again later. in UI. I don't use any API key