scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.98k stars 569 forks source link

Log file not found #235

Closed amarynets closed 7 years ago

amarynets commented 7 years ago

Hello, I have a question about log files. After some time jobs are disappear from scrapyd server and I can't download log from 127.0.0.1:6800/logs/spiders/mhn/79e8c970575511e7a52b742f68d0cfee.log, but file exists in folder logs. How can I solve this?

Digenis commented 7 years ago
  1. Are the spiders to which the logs belong to still running?
  2. Do the jobs you are looking for surpass the finished_to_keep setting?
  3. Also know that the jobs list in scrapyd may display less jobs that the total available logs of past jobs.
  4. Check the scrapyd log.
  5. Are you trying this on windows?

My guess is that it's point 3 that is confusing you, the /jobs html page. If the logs are there, scrapyd should be able to serve them. (don't confuse jobs with logs)

amarynets commented 7 years ago

@Digenis

  1. No. Spiders are stopped works.
  2. No. I have 50 or fewer jobs finished.
  3. It means that I can't get a log from job wich are not display in job list?
  4. No, I'm not trying. I turned off option jobs_to_keep in settings, now I'm testing this if it doesn't help I will try smt another
Digenis commented 7 years ago

jobs_to_keep can't be turned off, it only falls back to a default

By scrapyd log, I mean the daemon's log, not individual spider logs.

amarynets commented 7 years ago

Here a log from scrapyd:

2017-06-26T09:38:56+0000 [twisted.python.log#info] "127.0.0.1" - - [26/Jun/2017:09:38:56 +0000] "GET > /logs/exa/tc/fe6d43485a5111e7a2a1f23c910a61ba.log HTTP/1.1" 404 145 "-" "Python-urllib/3.4"

And yes, jobs_to_keep just saved log for the last N jobs, not for all jobs. Is it possible set jobs_to_keep to 1M?

Digenis commented 7 years ago

yes, you can set it to a high value. was this the problem that you are reporting?

amarynets commented 7 years ago

Yes, it is. Thanks a lot

Digenis commented 7 years ago

ok, then it's not a bug. this is the way the /jobs endpoint is supposed to work. perhaps we can add a notice at the end of the table saying "only the last N jobs are shown"