my8100 / scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:
https://github.com/my8100/files
GNU General Public License v3.0
3.17k stars 565 forks source link

Support compressed logs and items #2

Closed waldner closed 6 years ago

waldner commented 6 years ago

The barebones web interface of scrapyd can display compressed (eg, .gz) log and item files, while scrapydweb seems not to.

my8100 commented 6 years ago

The barebones web interface of scrapyd can display compressed (eg, .gz) log and item files, while scrapydweb seems not to.

Hi, @waldner The items page would be added in future updates.

But I couldn't find the "compressed (eg, .gz) log" you mentioned, could you please upload a screenshoot?

Currently, the utf8 page is compressed in gzip, and you can get the original log (text/plain) via the Source link. image

Also, the logs page is available at Logs/Directory. image

waldner commented 6 years ago

Sure, for example see this:

screenshot_2018-10-09_17-20-14

The .gz files are shown correctly if I open them in scrapyd's own web interface, while they fail in scrapydweb. To reproduce, you can just compress a log file by hand in scrapyd's directory.

my8100 commented 6 years ago

The .gz files are shown correctly if I open them in scrapyd's own web interface, while they fail in scrapydweb. To reproduce, you can just compress a log file by hand in scrapyd's directory.

Ok, I got the following result, would fix the problem ASAP.

image

my8100 commented 6 years ago

@waldner Add Items page, and support extensions ['.log', '.log.gz', '.gz', '.txt', ''] for locating scrapy log in v0.9.5

waldner commented 6 years ago

Great! It even works when clicking on the log in the job list view, whereas scrapyd's doesn't (you have to navigate to the actual folders). Thanks!

my8100 commented 6 years ago

Great! It even works when clicking on the log in the job list view, whereas scrapyd's doesn't (you have to navigate to the actual folders). Thanks!

@waldner How do you schedule a spider to run and log in .log.gz format? It seems that only -d setting=LOG_FILE=logs/demo/test/2018-10-12T05_44_53.txt works, whereas .log.gz and .gz would cause "ContentDecodingError ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',))".

Or do you just compress a log file by hand after the crawling finished? ("It even works when clicking on the log in the job list view" only refers to the Finished part?)

waldner commented 6 years ago

I suppose it's perfectly possible to use a custom logger that writes directly to a compressed file (or a custom feed exporter to do the same with items), although in my case they get compressed by a custom process after scrapy has run.