my8100 / scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:
https://github.com/my8100/files
GNU General Public License v3.0
3.16k stars 565 forks source link

customize json_url with actual server ip in stats.json #103

Closed LcodingL closed 4 years ago

LcodingL commented 4 years ago

Describe the need I want to call the API http://host_ip:6800/logs/stats.json to get all the jobs' json_url and request them externally .But all json_urls I got were started with http://127.0.0.1:6800/ which were not the actual host_ip.

I set as below at the beginning .

SCRAPYD_SERVERS = [ ('','','127.0.0.1','6800','')] 
LOCAL_SCRAPYD_SERVER='127.0.0.1:6800'
LOCAL_SCRAPYD_LOGS_DIR='/root/logs'
ENABLE_LOGPARSER=True

But things didn't change after I changed the configuration as below and restarted the scrapydweb.

SCRAPYD_SERVERS = [ ('','','actual host_ip','6800','')] 
LOCAL_SCRAPYD_SERVER=''
LOCAL_SCRAPYD_LOGS_DIR=''
ENABLE_LOGPARSER=False

And the Column url_scrapydweb stored in Table metadata, Databases scrapydweb_metadata is always http://127.0.0.1:5000 no matter how many times i altered to the actual host_ip manually .

Screenshots Since i could not upload the screenshots successfully, i just paste the returned json data below:

{ status: "ok", datas: { 2019Phase1: { gxb_1: { task_1_2019-11-19T20_00_00: { log_path: "/root/logs/2019Phase1/gxb_1/task_1_2019-11-19T20_00_00.log", json_path: "/root/logs/2019Phase1/gxb_1/task_1_2019-11-19T20_00_00.json", json_url: "http://127.0.0.1:6800/logs/2019Phase1/gxb_1/task_1_2019-11-19T20_00_00.json", size: 4703, position: 4703, status: "ok", pages: 12, items: 2, first_log_time: "2019-11-19 20:00:21", latest_log_time: "2019-11-19 20:00:25", runtime: "0:00:04", shutdown_reason: "N/A", finish_reason: "finished", last_update_time: "2019-11-19 20:00:29" },

Environment (please complete the following information):

my8100 commented 4 years ago

Modify the settings.py of LogParser instead.

https://github.com/my8100/logparser/blob/62c04516e93135e593ec302fe97a1a4b14a31cc2/logparser/settings.py#L35

SCRAPYD_SERVER = '127.0.0.1:6800'
my8100 commented 4 years ago

@LcodingL Has your problem been solved?

LcodingL commented 4 years ago

Hi, thanks for your help ! I will try it tomorrow and give you feedback ASAP.

LcodingL commented 4 years ago

Hi, sorry for the delay. For some reason i cannot have a try with your instruction.Later once i get the opportunity i will give you feedback.

Beside, i want to know how to store the actual host_ip in the Column url_scrapydweb in Table metadata, Databases scrapydweb_metadata. It is always http://127.0.0.1:5000 no matter how many times i altered to the actual host_ip manually .

my8100 commented 4 years ago

Never manually modify the database scrapydweb_metadata. The value of url_scrapydweb in that database depends on SCRAPYDWEB_BIND.

LcodingL commented 4 years ago

Hi~ Ive modified SCRAPYD_SERVER to actual host_ip in the settings.py of LogParser and restart scrapydweb——I set ENABLE_LOGPARSER=True in configuration of scrapydweb. But the json_url i got in API http://host_ip:6800/logs/stats.json was still started with http://127.0.0.1:6800/.

my8100 commented 4 years ago

Can you test as follows:

  1. Stop scrapydweb.
  2. Execute ‘logparser’ separately.
  3. Check json_url in the generated stats.json
  4. Post the logs of logparser.
LcodingL commented 4 years ago

Hi~ Ive followed those steps and the json_url in API http://host_ip:6800/logs/stats.json still starts with http://127.0.0.1:6800/ Below is the logs of logparser:

[2019-11-26 12:56:03,029] INFO in logparser.run: LogParser version: 0.8.2 [2019-11-26 12:56:03,030] INFO in logparser.run: Use 'logparser -h' to get help [2019-11-26 12:56:03,030] INFO in logparser.run: Main pid: 14440 [2019-11-26 12:56:03,030] INFO in logparser.run: Check out the config file below for more advanced settings.


Loading settings from /root/Envs/spider_py3.6/lib/python3.6/site-packages/logparser/settings.py


[2019-11-26 12:56:03,033] DEBUG in logparser.run: Reading settings from command line: Namespace(delete_json_files=False, disable_telnet=False, main_pid=0, scrapyd_logs_dir='/root/logs', scrapyd_server='10.3.64.153:6800', sleep=10, verbose=False) [2019-11-26 12:56:03,033] DEBUG in logparser.run: Checking config [2019-11-26 12:56:03,033] INFO in logparser.run: SCRAPYD_SERVER: 10.3.64.153:6800 [2019-11-26 12:56:03,033] INFO in logparser.run: SCRAPYD_LOGS_DIR: /root/logs [2019-11-26 12:56:03,033] INFO in logparser.run: PARSE_ROUND_INTERVAL: 10 [2019-11-26 12:56:03,034] INFO in logparser.run: ENABLE_TELNET: True [2019-11-26 12:56:03,034] INFO in logparser.run: DELETE_EXISTING_JSON_FILES_AT_STARTUP: False [2019-11-26 12:56:03,034] INFO in logparser.run: VERBOSE: False


Visit stats at: http://10.3.64.153:6800/logs/stats.json


my8100 commented 4 years ago

Can you run ‘logparser --delete_json_files’ and post the content of http://10.3.64.153:6800/logs/stats.json

LcodingL commented 4 years ago

{ status: "ok", datas: { 2019Phase1: { gxb_1: { task_1_2019-11-22T20_00_00: { log_path: "/root/logs/2019Phase1/gxb_1/task_1_2019-11-22T20_00_00.log", json_path: "/root/logs/2019Phase1/gxb_1/task_1_2019-11-22T20_00_00.json", json_url: "http://127.0.0.1:6800/logs/2019Phase1/gxb_1/task_1_2019-11-22T20_00_00.json", size: 4703, position: 4703, status: "ok", pages: 12, items: 2, first_log_time: "2019-11-22 20:00:11", latest_log_time: "2019-11-22 20:00:13", runtime: "0:00:02", shutdown_reason: "N/A", finish_reason: "finished", last_update_time: "2019-11-22 20:00:18" }} }, settings_py: "/root/Envs/spider_py3.6/lib/python3.6/site-packages/logparser/settings.py", settings: { scrapyd_server: "10.3.64.153:6800", scrapyd_logs_dir: "/root/logs", parse_round_interval: 10, enable_telnet: true, override_telnet_console_host: "", log_encoding: "utf-8", log_extensions: [ ".log", ".txt" ], log_head_lines: 100, log_tail_lines: 200, log_categories_limit: 10, jobs_to_keep: 100, chunk_size: 10000000, delete_existing_json_files_at_startup: false, keep_data_in_memory: false, verbose: false, main_pid: 0 }, last_update_timestamp: 1574746025, last_update_time: "2019-11-26 13:27:05", logparser_version: "0.8.2" }

my8100 commented 4 years ago

Please delete stats.json and run ‘logparser --delete_json_files’ for the first time.

LcodingL commented 4 years ago

Great! It does work! Thanks a lot for your help and patience! Have a nice day~