scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.92k stars 569 forks source link

How to specify the naming convention of log files? #504

Closed aaronm137 closed 2 months ago

aaronm137 commented 2 months ago

Hello,

in my Scrapy spider, I specify the name of the log file as follows:

custom_settings = {
        'LOG_FILE': f'{datetime.fromtimestamp(time.time()).strftime("%Y-%m-%d_%H%M%S")}_{project_name}.log,
        'LOG_LEVEL': 'INFO'
}

so the name of the log file looks like this: 2024-05-23_103249_my-cool-spider.log. This works perfectly on localhost.

When I deploy it to production where Scrapyd takes case of running spider jobs, the log files naming convention specified above is ignored and instead of that is used Scrapyd's naming convention that looks like this: task_169_2024-06-14T13_55_48.log.

Is there any way to change the naming convention, so Scrapyd would respect the format specified in the Scrapy spider?

jpmckinney commented 2 months ago

In https://github.com/scrapy/scrapyd/blob/master/scrapyd/environ.py, if logs_dir is set in Scrapyd's configuration file, then Scrapy's LOG_FILE setting is overridden. The pattern is {logs_dir value}/{Scrapyd project name}/{Scrapy spider name}/{job ID}.log

In https://github.com/scrapy/scrapyd/blob/master/scrapyd/webservice.py, the job ID defaults to a uuid.uuid().hex, but you can provide your own jobid when scheduling the job. See https://scrapyd.readthedocs.io/en/latest/api.html#schedule-json

So, if you only want to configure the filename, set the job ID when scheduling. task_169_2024-06-14T13_55_48.log is already not a UUID, so you (or some software you're using) must already be setting the jobid (or, perhaps you haven't set logs_dir and it is using the non-overridden LOG_FILE setting).