nopSolutions / nopCommerce

ASP.NET Core eCommerce software. nopCommerce is a free and open-source shopping cart.
https://www.nopcommerce.com
Other
9.25k stars 5.31k forks source link

Block unwanted search enginee bots #2283

Closed AndreiMaz closed 6 years ago

AndreiMaz commented 7 years ago

It is really important to block bots from some countries if you are not shipping there. For example, after blocking all traffic from China and Russia(Andrei never mind :) ) and Germany on an eCommerce application which only ship into USA, I noticed a significant performance improvement. You block/challenge countries using different website security service tools like incapsula (https://www.incapsula.com/) or cloudflare (https://www.cloudflare.com//) etc. Also, you can block IP from web.config easily.

fjqm1

Fig 1: Look at AWS cloud watch. CPU usages, web traffic significantly reduced.

ysvy5

Fig 2: Server response time incredibly improved - Newrelic.

You can write a simple trigger to catch unwanted bots ip and block it.

Create table to track all search engine ips CREATE TABLE [dbo].[x_SearchEngine]( [Id] [int] IDENTITY(1,1) NOT NULL, [LastIpAddress] nvarchar NULL, [CreatedOnUtc] [datetime] NOT NULL, PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]

GO

Create trigger to insert ips into previously created table CREATE TRIGGER [dbo].[x_searchengine] ON [dbo].[Customer] AFTER UPDATE AS BEGIN

SET NOCOUNT ON;

IF EXISTS(SELECT 1 FROM inserted where Id = 42693958) BEGIN insert into x_SearchEngine (LastIPAddress, CreatedOnUtc) select LastIpAddress, GETUTCDATE() from inserted where id = 42693958 END

END

Wait few hours and run following query and find bots who is killing your server select lastipaddress, count() from x_searchengine group by lastipaddress order by count() desc

Find country from ip using iplocation.net (https://www.iplocation.net/) and see if the country/ip is relevant for your business otherwise block it.

You can also visit my Quora (https://www.quora.com/Is-it-OK-to-block-search-engine-bots-from-China-and-Russia-on-a-USA-based-e-commerce-site) post for more details

Source: http://www.nopcommerce.com/boards/t/46819/performance-improvement-tips-block-unwanted-search-enginee-bots.aspx

DarthSonic commented 7 years ago

Why not block UserAgent instead of IP? IPs are changing and there are a lot of IPs to block but UserAgents are mostly constant and you will also block a whole bunch of IPs with blocking only one UserAgent.

A plugin for nopCommerce could block UserAgents by RegEx, Wildcard and string comparison. So you are able to block Search Engines and custom bots/crawlers.

kingmotley commented 7 years ago

You could try putting this in your robots.txt file (although not all bots respect it, it's easy to test):

User-agent: *
Crawl-delay: 10

Which tells any bot who asks, to please not hit more than 1 page every 10 seconds.

AndreiMaz commented 6 years ago

It should be available out of the box. It's better to leave to services such as CloudFlare or so. Our core should contain only ecommerce functionality. All the rest is for plugins or third-party services