scottsweb / wp-instagram-widget

❌ A WordPress widget for showing your latest Instagram photos.
115 stars 62 forks source link

Add flood check, with throttle, to prevent Instagram IP bans. #108

Closed asadkn closed 4 years ago

asadkn commented 5 years ago

Recently, many shared hosts have had their IPs banned at instagram due to new instagram rate limits in placed. This is what I believe got this plugin, and other plugins that were scraping instagram, banned from WP repo as hosts revolted.

Consider there are 20 sites using this plugin on a share webhost and a rate limit is reached, temporarily. What happens:

What my PR changes do:

I wish this plugin still existed in repository. That way, all would upgrade and everyone would be safe in the shared hosts. Now, all the sites that will never get an update will keep being bad neighbors. Hopefully WP repo team can be convinced to bring it back.

asadkn commented 5 years ago

P.S. the above was a modification of what I am using on my private repo. I wasn't sure if you would be willing to accept more drastic changes. But here it is if you would like to review: https://gist.github.com/asadkn/e0aa74254d06cb5c1216048aa927ada1

Changes:

feastdesignco commented 5 years ago

This would be a minor improvement, but doesn't go far enough.

With pagespeed becoming more and more important, results should ALWAYS be loaded from a cache on the server. Realistically, 99% of blogs do not need a "live" instagram feed and a 24-hour cache is a good solution. This would further reduce the number of queries instagram receives, and loading images from the same server prevents additional lookups.

Server-side cache plus better control over lazyloading images with popular performance plugins (WP Rocket) is needed for this plugin to continue to be useful. We'd be willing to contribute financially to expedite this.

We've seen more and more recommendations from SEOs to remove the WPIW due to speed issues, which directly affects SERPs and therefore income of bloggers.

scottsweb commented 5 years ago

@asadkn asadkn Thanks for this I will take a look through. My feeling has been that certain hosts are being blocked but people have been quite reluctant to share their host details with me making it difficult to debug. What steps have you taken to reach this conclusion?

Caching the error is a good idea and certainly something that is needed.

Hopefully WP repo team can be convinced to bring it back.

Feel free to make a request to plugins@wordpress.org if you feel that will help

Use persistent storage in options.

The problem here is that you will end up creating database writes from the front end. Although this won't happen too often it is an unecessary perfromance hit. On decent hosts that transient API will be backed by a proper object cache.

Removed serialize + base64 as WordPress as WP native takes care of serialize.

This was added for emoji support. I think the best bet is to move to wp_encode_emoji but I haven't done any testing there yet.

@feastdesignco

results should ALWAYS be loaded from a cache on the server. Realistically, 99% of blogs do not need a "live" instagram feed

The images are always referenced from cache and the feed is not live, I think what you are asking for here is a local image cache (I think we chatted about this before). There would definately be some advantages to having the images sideloaded into the local WordPress install but also some disadvantages too. On the plus side it is fewer requests to Instagram, fewer DNS lookups and a good backup should Instagram ever dissappear. On the negative side, it will fill up your media library and you will loose the use of the Instagram CDN (which should be faster in thorey).

Remember page speed reports are not a true measure of speed. They are a guide for best practices. You can have a poor page speed score and a fast website. A tool like https://tools.pingdom.com/ will actually test the speed of your site.

asadkn commented 5 years ago

My feeling has been that certain hosts are being blocked but people have been quite reluctant to share their host details with me making it difficult to debug. What steps have you taken to reach this conclusion?

I am reluctant to share this in public, but we have a few WP themes all making use of this plugin with a combined user base of over 15k.

In the past 1.5 month or so, Instagram troubles have been at an all time high and in all cases it's been the issue of a host IP getting blocked by instagram. This usually happened with hitting a rate limit, getting an error, and causing a subsequent flood resulting in a more concrete ban.

So yes, that's the most important part of the PR.

The problem here is that you will end up creating database writes from the front end. Although this won't happen too often it is an unecessary perfromance hit. On decent hosts that transient API will be backed by a proper object cache. That's a very valid concern about database writes, but considering they're going to be infrequent, it's not as a big deal. From my experience, very rarely do hosts use the extended object cache APIs and that's a can of worms on its own - the transient timeouts are often not respected with ext object caches resulting in updates being too infrequent.

Further, get_option() also internally uses the extended object cache. So realistically, given an expiration of 2 hours, the data (in presence of an ext object cache), will be from ext object cache for those 2 hours and only then a database write will occur.

@feastdesignco consider lazyloading the instagram images. Pagespeed problems are usually solved by that and it's not the job of the plugin itself. Other than that, the plugin has 2 hour cache by default and you can increase it using a built-in filter (add it to your themes).

feastdesignco commented 5 years ago

The images are always referenced from cache and the feed is not live, I think what you are asking for here is a local image cache (I think we chatted about this before). There would definately be some advantages to having the images sideloaded into the local WordPress install but also some disadvantages too. On the plus side it is fewer requests to Instagram, fewer DNS lookups and a good backup should Instagram ever dissappear. On the negative side, it will fill up your media library and you will loose the use of the Instagram CDN (which should be faster in thorey).

Remember page speed reports are not a true measure of speed. They are a guide for best practices. You can have a poor page speed score and a fast website. A tool like https://tools.pingdom.com/ will actually test the speed of your site.

Yes, we had a bit of a discussion on this previously. Local image caching is what I was referring to, correct. To account for the "media library filling up", you would clear out the old images when you load in the new ones. At 100-kb (?) per image, 10 images would consume about 1-mb of disk space. Even at a ridiculous 200-kb per image with 50 images being displayed, this would be only 10-mb of disk space.

re: Google pagespeed - this tool was recently updated: https://webmasters.googleblog.com/2018/11/pagespeed-insights-now-powered-by.html

I would consider this tool the gold standard of pagespeed issues, given who is pushing it. Not that pingdom isn't insightful, but Google knows what Google is looking for better than a third party.

@asadkn Unfortunately, currently, this plugin has a hard-coded src and no filter to add WP Rocket's data-lazy-src as defined at https://docs.wp-rocket.me/article/130-manually-apply-lazyload-to-an-image. I've put in a pull request to have these added here: https://github.com/scottsweb/wp-instagram-widget/pull/104

You and I are on the same page in terms of improving the performance of the plugin, but what you have here is a half-measure. It's an improvement, but still falls short.

This plugin (and your change) caches the instagram image URLs, not the images themselves. This results in the visitor's browser performing an additional DNS lookup for instagram, so that it can fetch the image from instagram's servers. This DNS lookup averages 120ms in our testing, excluding the image download, which is slower than downloading images from a local server cache (except in rare, exceptionally poor hosting environments).

By having the images (not URLs) cached locally, we avoid having the visitors perform a DNS lookup with every page load, and can ensure that images are available even if instagram "updates" their caching URLs. This happened last year if I recall correctly. Taking the additional load off instagram directly would (in theory) make them a little more forgiving of the plugin's existence.

With a 24 hour image cache, there would be a single fetch of the images from instagram servers, once per day, rather than potentially hundreds (thousands for many blogs) individual fetches each day - one from each visitor to a blog. It distributes the load onto the blogger's servers, where it belongs, rather than instagram's servers.

And this last part is purely speculation, but by not loading images directly from instagram's servers, you're better protecting your visitor's privacy from instagram tracking which IP address loaded which image.

scottsweb commented 4 years ago

Thanks for your contribution.

This project is being archived (background in #118). Instagram filed a trademark complaint which saw the plugin removed from WordPress.org and then proceeded to block it from accessing instagram.com. All pull requests are being closed and it will soon be in a read only state.