mozilla / Bugzilla-ETL

ETL for feeding bug snapshots to an ElasticSearch cluster
Mozilla Public License 2.0
5 stars 9 forks source link

bz_etl.py ends with "thread stopped" message #21

Open johnjreiser opened 3 years ago

johnjreiser commented 3 years ago

I'm trying to use this tool to take a MySQL backup of our Bugzilla instance and push the details into Elastic. When running bz_etl.py, I receive the following message:

2021-01-14 13:13:08.507053 - Thread "timers daemon" got request to stop
2021-01-14 13:13:08.507198 - "Main Thread" waiting on thread "timers daemon"
2021-01-14 13:13:08.580425 - thread "timers daemon" stopping
2021-01-14 13:13:08.580757 - thread "timers daemon" is done
2021-01-14 13:13:08.581174 - Thread "Main Thread" now stopped

I haven't been able to get the script to produce any other debugging information.

Here's the "settings.json" file used (taken from this repo, with a few modifications.

{
    "param": {
        "start": 0,
        "increment": 1,
        "first_run_time": "/import/first_run_time.txt",
        "last_run_time": "/import/last_run_time.txt",
        "look_back": 1610545510000, // HOUR = 60*60*1000
        "allow_private_bugs": {"$ref": "env://ETL_PRIVATE_BUGS"},
        "debug":{
            "constants":{"simple.DEBUG_SHOW_DETAILS":true}
        }
    },
    "alias": {
        "start": 0,
        "increment": 100000,
        "elasticsearch": {
            "host": {"$ref": "env://ES_HOST"},
            "port": {"$ref": "env://ES_PORT"},
            "index": "bug_aliases"
        },
        "file": {
            "path": "resources/schema/bugzilla_aliases.json",
            "key": {"$ref": "env://ETL_ALIAS_KEY"}
        }
    },
    "bugzilla": {
        "username": {"$ref": "env://MYSQL_USER"},
        "password": {"$ref": "env://MYSQL_PASSWORD"},
        "preamble": "from https://github.com/klahnakoski/Bugzilla-ETL",
        "host": {"$ref": "env://MYSQL_HOST"},
        "port": {"$ref": "env://MYSQL_PORT"},
        "schema": {"$ref": "env://MYSQL_SCHEMA"},
        "debug":{
            "constants":{"simple.DEBUG_SHOW_DETAILS":true}
        }
    },
    "es": {
        "host": {"$ref": "env://ES_HOST"},
        "port": {"$ref": "env://ES_PORT"},
        "index": {"$ref": "env://ETL_BUGS"},
        "type": "bug_version",
        "schema": {
            "$ref": "../schema/bug_version.json"
        },
        "timeout": 60
    },
    "es_comments": {
        "host": {"$ref": "env://ES_HOST"},
        "port": {"$ref": "env://ES_PORT"},
        "index": {"$ref": "env://ETL_COMMENTS"},
        "type": "bug_comment",
        "schema": {
            "$ref": "../schema/bug_comments.json"
        },
        "timeout": 60
    },
    "constants": {
        "jx_elasticsearch.meta.DEBUG": true,
        "jx_elasticsearch.meta.ENABLE_META_SCAN": false,
        "pyLibrary.sql.mysql.EXECUTE_TIMEOUT": 0,
        "pyLibrary.env.http.default_headers": {
            "Referer": "https://github.com/mozilla/Bugzilla-ETL"
        },
        "mo_json.SNAP_TO_BASE_10": false
    },
    "debug": {
        "trace": true,
        "log": [
            {
                "log_type": "mozlog",
                "appname": {"$ref": "env://LOG_APPNAME"}
            }
        ]
    }
}

Ultimately, our specific use case only needs a one-time push and not recurring refreshes, as we moved off of Bugzilla in 2018, but have approximately 13 years of knowledge within, so we'd like to replicate it somewhere in our environment, instead of keeping Bugzilla running, but locked down. I feel like I'm very close to getting this working for this purpose.

Any assistance you can provide on why this seems to immediately stop without other details would be of great help. Thanks!

klahnakoski commented 3 years ago

Sorry for the delay responding

  1. The output you see is from an orderly shutdown. So there must be a reason higher above about why.
  2. There are a number of DEBUG statements in the source code: You can turn them on for more details, either directly or by using the "constants" found in the config file: eg "jx_elasticsearch.meta.DEBUG": true,
  3. be sure all the "$ref": "env://** are environment variables that are properly set (I think the code complains if not set).
  4. there is a docker image which you can use for hints about setup