my8100 / scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:
https://github.com/my8100/files
GNU General Public License v3.0
3.12k stars 555 forks source link

DATABASE_URL and DATA_PATH options do not take effect in the config file #100

Closed LcodingL closed 4 years ago

LcodingL commented 4 years ago

Describe the bug Ive set the option DATABASE_URL to support MySQL in a correct format and restart scrapydweb,but no DBS in [DB_APSCHEDULER, DB_TIMERTASKS, DB_METADATA, DB_JOBS] had been created and the Settings of DATABASE displayed on web UI are still "sqlite:////......"

To Reproduce Steps to reproduce the behavior:

  1. edit 'scrapydweb_settings_v10.py' with 'DATABASE_URL = 'mysql://root:1@127.0.0.1:3306''
  2. run command: pip install --upgrade pymysql
  3. restart scrapydweb by running command 'scrapydweb' under path where the config file is.

Expected behavior

  1. I used to use default DATABASE_URL and data were stored in sqlite normally and now i want to use MySQL backend. Will the related databases be created in mysql automatically?
  2. Since i didnt do database migration from sqlite to mysql manually, i thought no job status should be displayed on Dashboard after i set DATABASE_URL of mysql in config file .But it showed all the jobs status as before and the Settings of DATABASE displayed on web UI are still "sqlite:////......"

    3.If the Settings of DATABASE displayed on web UI is right the database used by the running scrapydweb?

    4.Do i need to migrate data from sqlite to mysql manually if i want use MySQL backend in the future?

Logs

[2019-11-04 15:48:56,143] INFO in apscheduler.scheduler: Scheduler started [2019-11-04 15:48:56,162] INFO in scrapydweb.run: ScrapydWeb version: 1.4.0 [2019-11-04 15:48:56,163] INFO in scrapydweb.run: Use 'scrapydweb -h' to get help [2019-11-04 15:48:56,163] INFO in scrapydweb.run: Main pid: 2630 [2019-11-04 15:48:56,163] DEBUG in scrapydweb.run: Loading default settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/default_settings.py


Overriding custom settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py


[2019-11-04 15:48:56,321] DEBUG in scrapydweb.run: Reading settings from command line: Namespace(bind='0.0.0.0', debug=False, disable_auth=False, disable_logparser=False, disable_monitor=False, port=5000, scrapyd_server=None, switch_scheduler_state=False, verbose=True) [2019-11-04 15:48:56,321] DEBUG in scrapydweb.utils.check_app_config: Checking app config [2019-11-04 15:48:56,323] INFO in scrapydweb.utils.check_app_config: Setting up URL_SCRAPYDWEB: http://127.0.0.1:5000 [2019-11-04 15:48:56,324] DEBUG in scrapydweb.utils.check_app_config: Checking connectivity of SCRAPYD_SERVERS...

Index Group Scrapyd IP:Port Connectivity Auth ####################################################################### 1 dataocean____ 10.8.32.56:6800 True None 2 dataocean 10.8.64.78:6800 True None #######################################################################

/Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/sqlalchemy/ext/declarative/clsregistry.py:129: SAWarning: This declarative base already contains a class with the same class name and module name as scrapydweb.models.Job, and will be replaced in the string-lookup table. % (item.module, item.name) [2019-11-04 15:48:56,436] DEBUG in scrapydweb.utils.check_app_config: Created 2 tables for JobsView [2019-11-04 15:48:56,436] INFO in scrapydweb.utils.check_app_config: Locating scrapy logfiles with SCRAPYD_LOG_EXTENSIONS: ['.log', '.log.gz', '.txt'] [2019-11-04 15:48:56,440] INFO in scrapydweb.utils.check_app_config: Scheduler for timer tasks: STATE_RUNNING [2019-11-04 15:48:56,481] INFO in scrapydweb.utils.check_app_config: create_jobs_snapshot (trigger: interval[0:05:00], next run at: 2019-11-04 15:53:56 CST)


Visit ScrapydWeb at http://127.0.0.1:5000 or http://IP-OF-THE-CURRENT-HOST:5000


[2019-11-04 15:48:56,486] INFO in scrapydweb.run: For running Flask in production, check out http://flask.pocoo.org/docs/1.0/deploying/

  • Serving Flask app "scrapydweb" (lazy loading)
  • Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Debug mode: off [2019-11-04 15:48:56,487] DEBUG in apscheduler.scheduler: Next wakeup is due at 2019-11-04 15:53:56.480998+08:00 (in 299.999017 seconds) [2019-11-04 15:49:26,498] INFO in werkzeug: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) [2019-11-04 15:49:26,585] DEBUG in ApiView: view_args of >http://127.0.0.1:5000/1/api/daemonstatus/ { "node": 1, "opt": "daemonstatus", "project": null, "version_spider_job": null }

Environment (please complete the following information):

Thx for your time !

my8100 commented 4 years ago

Check and make sure DATABASE_URL has been configured as expected:

$ echo $DATABASE_URL
$ python -c "from scrapydweb_settings_v10 import DATABASE_URL; print(DATABASE_URL)"
LcodingL commented 4 years ago

Thx for your timely reply! I set configuration in this way:

DATABASE_URL = 'mysql://root:1@127.0.0.1:3306'

Runned command and got results as below: $ echo $DATABASE_URL -->'' $ python -c "from scrapydweb_settings_v10 import DATABASE_URL; print(DATABASE_URL)" -->'mysql://root:1@127.0.0.1:3306'

Is there anything wrong?

my8100 commented 4 years ago

Is it the file you are editing?

Overriding custom settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py
  1. Remove the DATABASE_URL option in the config file.
  2. Execute $ export SCRAPYDWEB_TESTMODE=True and $ export DATABASE_URL=mysql://root:1@127.0.0.1:3306
  3. Restart scrapydweb, some log like below should be found at the begining.
  4. If not found, mv scrapydweb_settings_v10.py scrapydweb_settings_v10.py.bak, then pip uninstall scrapydweb, then pip install --upgrade scrapydweb, finally restart scrapydweb.
    APSCHEDULER_DATABASE_URI: mysql://root:rootpw@127.0.0.1:3306/scrapydweb_apscheduler
    SQLALCHEMY_DATABASE_URI: mysql://root:rootpw@127.0.0.1:3306/scrapydweb_timertasks
    SQLALCHEMY_BINDS: {'jobs': 'mysql://root:rootpw@127.0.0.1:3306/scrapydweb_jobs', 'metadata': 'mysql://root:rootpw@127.0.0.1:3306/scrapydweb_metadata'}
LcodingL commented 4 years ago

Great ! Ive tried the first three methods you listed above and it works! 4 related databases has been created automatically and data are stored normally! Also, ive tried to $ export SCRAPYDWEB_TESTMODE=False and restart and it works as well. So It seems to be that we should set DATABASE_URL in server envionment variables instead of in config file.Is that designed so or something need correction?

Besides, Could i do the database migration of scrapydweb_timertasks to save duplicate operation of scheduling timer tasks manually again?

THANKS A LOT ^^

my8100 commented 4 years ago
  1. Make sure there’s only one DATABASE_URL in the file.
    $ cat /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py | grep DATABASE_URL
  2. Set DATABASE_URL = 'mysql://root:1@127.0.0.1:3306' in the config file above.
  3. Execute $ export DATABASE_URL=
  4. Restart scrapydweb.

You can try to migrate the database by yourself.

LcodingL commented 4 years ago

Ive followed the steps and failed.The config file didnt work.It is likely to be that we must set environment variables export DATABASE_URL=mysql://username:password@IP:PORT manually to make mysql backend valid.

my8100 commented 4 years ago
$ export DATABASE_URL=
$ echo $DATABASE_URL
$ mv scrapydweb_settings_v10.py scrapydweb_settings_v10.py.bak
$ pip uninstall scrapydweb
$ pip install --upgrade scrapydweb

Restart scrapydweb and re-config the new generated file. If still not working, post the full log, as well as the result of the following cmd:

$ echo $DATABASE_URL
$ pwd
$ cat scrapydweb_settings_v10.py | grep DATABASE_URL
LcodingL commented 4 years ago

Hi Ive done the reinstallation and ran it with new config file yet it failed again.Below is the full log for your reference:

[2019-11-06 23:31:41,179] INFO in apscheduler.scheduler: Scheduler started [2019-11-06 23:31:41,186] INFO in scrapydweb.run: ScrapydWeb version: 1.4.0 [2019-11-06 23:31:41,187] INFO in scrapydweb.run: Use 'scrapydweb -h' to get help [2019-11-06 23:31:41,187] INFO in scrapydweb.run: Main pid: 10215 [2019-11-06 23:31:41,187] DEBUG in scrapydweb.run: Loading default settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/default_settings.py


Overriding custom settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py


[2019-11-06 23:31:41,301] DEBUG in scrapydweb.run: Reading settings from command line: Namespace(bind='0.0.0.0', debug=False, disable_auth=False, disable_logparser=False, disable_monitor=False, port=5000, scrapyd_server=None, switch_scheduler_state=False, verbose=False) [2019-11-06 23:31:41,301] DEBUG in scrapydweb.utils.check_app_config: Checking app config [2019-11-06 23:31:41,303] INFO in scrapydweb.utils.check_app_config: Setting up URL_SCRAPYDWEB: http://127.0.0.1:5000 [2019-11-06 23:31:41,303] DEBUG in scrapydweb.utils.check_app_config: Checking connectivity of SCRAPYD_SERVERS...

Index Group Scrapyd IP:Port Connectivity Auth #################################################################################################### 1 None____ 127.0.0.1:6800____ True None 2 test____ localhost:6800____ True None ####################################################################################################

/Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/sqlalchemy/ext/declarative/clsregistry.py:129: SAWarning: This declarative base already contains a class with the same class name and module name as scrapydweb.models.Job, and will be replaced in the string-lookup table. % (item.module, item.name) [2019-11-06 23:31:41,434] DEBUG in scrapydweb.utils.check_app_config: Created 2 tables for JobsView [2019-11-06 23:31:41,434] INFO in scrapydweb.utils.check_app_config: Locating scrapy logfiles with SCRAPYD_LOG_EXTENSIONS: ['.log', '.log.gz', '.txt'] [2019-11-06 23:31:41,439] INFO in scrapydweb.utils.check_app_config: Scheduler for timer tasks: STATE_RUNNING [2019-11-06 23:31:41,479] INFO in scrapydweb.utils.check_app_config: create_jobs_snapshot (trigger: interval[0:05:00], next run at: 2019-11-06 23:36:41 CST)


Visit ScrapydWeb at http://127.0.0.1:5000 or http://IP-OF-THE-CURRENT-HOST:5000


[2019-11-06 23:31:41,484] INFO in scrapydweb.run: For running Flask in production, check out http://flask.pocoo.org/docs/1.0/deploying/

  • Serving Flask app "scrapydweb" (lazy loading)
  • Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Debug mode: off [2019-11-06 23:31:41,485] DEBUG in apscheduler.scheduler: Next wakeup is due at 2019-11-06 23:36:41.479754+08:00 (in 299.998942 seconds) [2019-11-06 23:32:11,694] INFO in werkzeug: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) [2019-11-06 23:32:17,956] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:17] "GET /1/nodereports/ HTTP/1.1" 200 - [2019-11-06 23:32:18,008] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/css/style.css HTTP/1.1" 200 - [2019-11-06 23:32:18,009] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/css/icon_upload_icon_right.css HTTP/1.1" 200 - [2019-11-06 23:32:18,014] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/css/dropdown.css HTTP/1.1" 200 - [2019-11-06 23:32:18,016] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/icons_menu.js HTTP/1.1" 200 - [2019-11-06 23:32:18,017] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/github_buttons.js HTTP/1.1" 200 - [2019-11-06 23:32:18,025] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/common.js HTTP/1.1" 200 - [2019-11-06 23:32:18,026] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/jquery.min.js HTTP/1.1" 200 - [2019-11-06 23:32:18,031] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/vue.min.js HTTP/1.1" 200 - [2019-11-06 23:32:18,033] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/element-ui%402.4.6/lib/theme-chalk/index.css HTTP/1.1" 200 - [2019-11-06 23:32:18,049] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/element-ui%402.4.6/lib/index.js HTTP/1.1" 200 - [2019-11-06 23:32:18,366] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/element-ui%402.4.6/lib/theme-chalk/fonts/element-icons.woff HTTP/1.1" 200 - [2019-11-06 23:32:18,378] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "POST /1/api/daemonstatus/ HTTP/1.1" 200 - [2019-11-06 23:32:22,901] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:22] "GET /1/settings/ HTTP/1.1" 200 -`

And the results of cmds: $ echo $DATABASE_URL

$ pwd

/Users/laihuiying/Workspace/PythonEnv/scrapydweb

$ cat scrapydweb_settings_v10.py | grep DATABASE_URL

DATABASE_URL = 'mysql://root:1@127.0.0.1:3306'

Thx for your patience !

my8100 commented 4 years ago

What’s the result of this cmd now?

$ python -c "from scrapydweb_settings_v10 import DATABASE_URL; print(DATABASE_URL)"
LcodingL commented 4 years ago

mysql://root:1@127.0.0.1:3306

my8100 commented 4 years ago

Can you post the screenshot of the related info in the Settings page?

LcodingL commented 4 years ago

Ive tried many times to upload screenshot but failed every time T.T

my8100 commented 4 years ago

Then just post the text.

LcodingL commented 4 years ago

For easy-reading,ive removed all the comments:

DATA_PATH = os.environ.get('DATA_PATH', '')

DATABASE_URL = 'mysql://root:1@127.0.0.1:3306'

my8100 commented 4 years ago

Actually, I’m asking for the value of DATABASE displayed on the web UI. How did you judge that the config in the file is not working? For convenience, you can execute $ export SCRAPYDWEB_TESTMODE=True and restart scrapydweb to see which backend is being used behind the scenes.

LcodingL commented 4 years ago

I judge from the DATABASE displayed on the web UI:

{ "APSCHEDULER_DATABASE_URI": "sqlite:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/apscheduler.db", "SQLALCHEMY_DATABASE_URI": "sqlite:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/timer_tasks.db", "SQLALCHEMY_BINDS_METADATA": "sqlite:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/metadata.db", "SQLALCHEMY_BINDS_JOBS": "sqlite:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/jobs.db" }

And no related database was created.

my8100 commented 4 years ago

Adding sys.path.append(os.getcwd()) before the try clause would fix the issue. lt’s /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/vars.py for your case.

Thanks for your support!

https://github.com/my8100/scrapydweb/blob/8104386438cb7e18e5b619c53aedf22dc5bb8954/scrapydweb/vars.py#L18-L21

LcodingL commented 4 years ago

Hi sorry for the delay Ive added that line before the try clause and restarted scrapydweb and it worked! Thank you so much for the helpful share and consistent dedication to make it better!

argoyal commented 4 years ago

I was facing similar issue. If the DATABASE_URL if present in the environment variable then it works. But if I try to create DATABASE_URL in the custom settings file using some other environment variables, then it fails to work. I will look into this and try to raise a PR resolving this issue.

Irving-plus commented 3 years ago

git 连接拉不下来

IMYR666 commented 2 years ago

Adding sys.path.append(os.getcwd()) before the try clause would fix the issue. lt’s /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/vars.py for your case.

Thanks for your support!

https://github.com/my8100/scrapydweb/blob/8104386438cb7e18e5b619c53aedf22dc5bb8954/scrapydweb/vars.py#L18-L21

Hi, the last version was 1.4.0 released on August 16, 2019. But this bug was fixed on May 11, 2020.Can you re-release the latest version? thx