meilisearch / scrapix

MIT License
21 stars 9 forks source link

Provide option to slow or rate limit requests #99

Open klvs opened 3 months ago

klvs commented 3 months ago

I've been testing out scrapix and first off, awesome work! With a little bit of tinkering around I got it working with meilisearch cloud FAST!

That said, it could be useful to add an option to rate limit request. I didn't see anything other than the batch_size which I believe has more to do with how frequently documents are imported into the search index.

This isn't as big an issue when it comes to indexing internal websites, but as I was testing it out on a rather large public collection of docs (reactnative.dev), it quickly stared denying my requests. Likely because scrapix was firing off LOTS of requests which might look a bit like malicious traffic.

Apache Nutch has a default rate limit of 5000ms (which in my opinion is a bit high). It could be a good idea to implement something like this for scrapix if it doesn't already exist. I could potentially implement it if you guys are welcoming PRs?

curquiza commented 3 months ago

Hello @klvs

Thanks for the suggestion. Pinging @meilisearch/product-team here to let them know 😇