Closed andrewbaxter closed 9 years ago
Python doesn't release file handlers to log files by default iirc. I'm not sure about the specifics of twisted logging in this regard but I would imagine it's fairly similar. Perhaps scrapyrt just needs to explicitly close the file handler at the end of the crawl?
@dpnova if you're right I assume it's rather Scrapy issue and should be reported there - ScrapyRT doesn't work with spiders' logs directly
@andrewbaxter have you had a chance to investigate this issue a bit deeper?
Another thing to mention - we are running instance which served 60k responses during last 5 days - load is approximately the same @andrewbaxter mentioned maybe even lower as we are caching results, but it was running non-stop during a couple of weeks - still no issues with file descriptors.
Sorry, I haven't. I figured it is working in your case, but unfortunately I don't have any insight into what we're doing differently.
It may be a Scrapy issue; I didn't think it was relevant to that project because Scrapy doesn't run multiple spiders AFAIK and so a log leak would probably not be a concern.
@andrewbaxter #13 and #14 resolved this issue, please update to recent ScrapyRT version and reopen this ticket if issue still exists.
Looking at lsof output - there's only 2-3 tcp connections, 99% of the open files are for the per-spider logs.
When this happens scrapyrt stops responding to requests.
It happens about once a day under lowish load (only a couple requests a minute).