Closed Nutomic closed 5 years ago
Hello,
You can tell if a request comes from PeerTube Index by checking:
PeertubeIndex
(this it is not really reliable because anyone could be using this user agent)peertube-index.net
.About the specific spike of requests you noticed, I believe it was not from PeerTube Index because the crawler does not visit the /api/v1/videos/video-id
URLs to fetch videos.
It uses the /api/v1/videos
endpoint, requesting all the available pages with a page size of 100:
GET /api/v1/videos?count=100&start=0
GET /api/v1/videos?count=100&start=100
GET /api/v1/videos?count=100&start=200
Moreover, PeerTube Index has been already up and crawling for several months now, scanning its known PeerTube instances every day.
As for limiting the rate of requests sent to an instance being scanned, I decided that requests going to a specific instance should be made sequentially. Therefore there is only one request at time going from the PeerTube Index crawler to a particular instance being scanned. This may definitely cause more that one request per second but I believe this is acceptable.
Okay then sorry to bother you, and thanks for the information :)
Hi,
I noticed today that I was getting a lot of requests on peertube.social, for URLs like /api/v1/videos/video-id. At times I was getting around 50 requests per second, and this caused a ton of CPU usage. Now I dont know if this was you, but it definitely looked like a crawl and apparently your site started on the same day, so it seems likely.
The problem is gone for now, probably because the crawler has finished its backlog. But you should definitely add a rate limit to your crawler if you havent already. I suggest something like 1 request per second at most.