redpanda-ai / Meerkat

Used for the Meerkat project
Other
1 stars 1 forks source link

"write_output" function fails on bad configuration file. #28

Closed redpanda-ai closed 10 years ago

redpanda-ai commented 10 years ago

Please validate the JSON config file before processing begins.

Here are logs demonstrating how complete processing takes place before throwing an Exception because the "output.file" key was not found in the JSON config file.

STEP TO REPRODUCE:

  1. Run __init__.py using config/default.json

    EXPECTED RESULT:

  2. An informative error message stating that the configuration file must have an "output.file" key and value; the program should then immediately halt without doing any actual work.

    ACTUAL RESULT:

2014-01-22 17:21:49,808 - thread 0 - INFO - Log initialized.
Input String  CHECKCARD 0126 ORIGINAL GRAVITY PUBLIC SAN JOSE CA 24690293027080080270199
2014-01-22 17:21:49,808 - thread 0 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
2014-01-22 17:21:49,816 - thread 1 - INFO - Log initialized.
2014-01-22 17:21:49,816 - thread 1 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
Input String  ARCO PAYPOINT 01/22 #000490226 PURCHASE 470 RALSTON AVE BELMONT CA
2014-01-22 17:21:49,817 - thread 2 - INFO - Log initialized.
2014-01-22 17:21:49,818 - thread 2 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
2014-01-22 17:21:49,818 - thread 3 - INFO - Log initialized.
2014-01-22 17:21:49,819 - thread 3 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
2014-01-22 17:21:49,824 - thread 4 - INFO - Log initialized.
2014-01-22 17:21:49,824 - thread 4 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
2014-01-22 17:21:49,825 - thread 5 - INFO - Log initialized.
2014-01-22 17:21:49,825 - thread 5 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
2014-01-22 17:21:49,825 - thread 6 - INFO - Log initialized.
2014-01-22 17:21:49,825 - thread 6 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
2014-01-22 17:21:49,826 - thread 7 - INFO - Log initialized.
2014-01-22 17:21:49,826 - thread 7 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
2014-01-22 17:21:49,826 - thread 8 - INFO - Log initialized.
2014-01-22 17:21:49,827 - thread 8 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
2014-01-22 17:21:49,827 - thread 9 - INFO - Log initialized.
2014-01-22 17:21:49,827 - thread 9 - INFO - {
    "concurrency": 10,
    "input": {
        "encoding": "utf-8",
        "filename": "../data/short_bank_transaction_descriptions.csv"
    },
    "logging": {
        "console": true,
        "formatter": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        "level": "info",
        "path": "../logs/logs.log"
    },
    "output": {
        "filepath": "../data/longTailLabeled.csv",
        "results": {
            "fields": [
                "BUSINESSSTANDARDNAME",
                "HOUSE",
                "PREDIR",
                "STREET",
                "STRTYPE",
                "CITYNAME",
                "STATE",
                "ZIP",
                "pin.location"
            ],
            "size": 10
        }
    }
}
2014-01-22 17:21:52,967 - thread 1 - INFO - TOKENS ARE: ['ARCO', 'PAYPOINT', '0122', '#0004', '9', '0226', 'PURCHASE', '470', 'RALSTON', 'AVE', 'BELMONT', 'CA']
2014-01-22 17:21:52,973 - thread 1 - INFO - Unigrams are:
    ['ARCO', 'PAYPOINT', '0122', '#0004', '9', '0226', 'PURCHASE', '470', 'RALSTON', 'AVE', 'BELMONT', 'CA']
2014-01-22 17:21:52,974 - thread 1 - INFO - Unigrams matched to ElasticSearch:
    ['PURCHASE', 'RALSTON', 'BELMONT', '#0004', '0226', '0122', 'ARCO', 'AVE', '470', 'CA']
2014-01-22 17:21:52,974 - thread 1 - INFO - Of these:
2014-01-22 17:21:52,974 - thread 1 - INFO -     2 stop words:      ['PAYPOINT', 'PURCHASE']
2014-01-22 17:21:52,980 - thread 1 - INFO -     0 phone_numbers:   []
2014-01-22 17:21:52,985 - thread 1 - INFO -     5 numeric words:   ['0122', '#0004', '9', '0226', '470']
2014-01-22 17:21:52,991 - thread 1 - INFO -     5 unigrams: ['ARCO', 'RALSTON', 'AVE', 'BELMONT', 'CA']
2014-01-22 17:21:53,062 - thread 1 - INFO -     1 addresses: ['470 RALSTON AVE BELMONT CA']
2014-01-22 17:21:53,063 - thread 1 - INFO - Search components are:
2014-01-22 17:21:53,068 - thread 1 - INFO -     Unigrams: 'ARCO RALSTON AVE BELMONT CA'
2014-01-22 17:21:53,069 - thread 1 - INFO -     Matching 'Address': '470 RALSTON AVE BELMONT CA'
2014-01-22 17:21:53,069 - thread 1 - INFO - {"size": 10, "fields": ["BUSINESSSTANDARDNAME", "HOUSE", "PREDIR", "STREET", "STRTYPE", "CITYNAME", "STATE", "ZIP", "pin.location"], "from": 0, "query": {"bool": {"minimum_number_should_match": 1, "should": [{"query_string": {"fields": ["_all^1", "BUSINESSSTANDARDNAME^2"], "query": "ARCO RALSTON AVE BELMONT CA", "boost": 1}}, {"match": {"composite.address^3": {"query": "470 RALSTON AVE BELMONT CA", "boost": 10, "type": "phrase"}}}]}}}
2014-01-22 17:21:53,208 - thread 1 - INFO - This system required 33 individual searches.
2014-01-22 17:21:53,215 - thread 1 - INFO - Z-Score delta: [0.731]
2014-01-22 17:21:53,215 - thread 1 - INFO - Top Score Quality: Low-grade
[0.113] Ralston Florist 932 Ralston Ave Belmont CA 94002 {'lat': '37.519724', 'lon': '-122.276999'}
[0.1] Ralston Florist 936 Ralston Ave Belmont CA 94002 {'lat': '37.519706', 'lon': '-122.27702'}
[0.082] Ralston Village Cleaners 980 Ralston Ave Belmont CA 94002 {'lat': '37.519505', 'lon': '-122.277254'}
[0.081] Ralston Elementary School 2675 Ralston Ave Belmont CA 94002 {'lat': '37.511269', 'lon': '-122.31189'}
[0.074] Belmont Gardens 1100 Ralston Ave Belmont CA 94002 {'lat': '37.517536', 'lon': '-122.278458'}
[0.07] Belmont Optique 877 Ralston Ave Belmont CA 94002 {'lat': '37.51979', 'lon': '-122.276451'}
[0.062] Whipple Arco 504 Whipple Ave Redwood City CA 94063 {'lat': '37.494598', 'lon': '-122.235054'}
[0.061] Universe of Colors of Belmont LLC 887 Ralston Ave Belmont CA 94002 {'lat': '37.519745', 'lon': '-122.276505'}
[0.058] Telegraph Arco 6407 Telegraph Ave Oakland CA 94609 {'lat': '37.850433', 'lon': '-122.260834'}
[0.058] Johns Arco 286 S Livermore Ave Livermore CA 94550 {'lat': '37.681221', 'lon': '-121.766518'}
2014-01-22 17:22:06,945 - thread 0 - INFO - TOKENS ARE: ['CHECKCARD', '0126', 'ORIGINAL', 'GRAVITY', 'PUBLIC', 'SAN', 'JOSE', 'CA', '24', '6902', '930', '27080', '0802', '7019', '9']
2014-01-22 17:22:06,945 - thread 0 - INFO - Unigrams are:
    ['CHECKCARD', '0126', 'ORIGINAL', 'GRAVITY', 'PUBLIC', 'SAN', 'JOSE', 'CA', '24', '6902', '930', '27080', '0802', '7019', '9']
2014-01-22 17:22:06,945 - thread 0 - INFO - Unigrams matched to ElasticSearch:
    ['ORIGINAL', 'GRAVITY', 'PUBLIC', '27080', '7019', '6902', 'JOSE', '0126', '0802', 'SAN', '930', 'CA', '24']
2014-01-22 17:22:06,945 - thread 0 - INFO - Of these:
2014-01-22 17:22:06,945 - thread 0 - INFO -     1 stop words:      ['CHECKCARD']
2014-01-22 17:22:06,945 - thread 0 - INFO -     0 phone_numbers:   []
2014-01-22 17:22:06,945 - thread 0 - INFO -     8 numeric words:   ['0126', '24', '6902', '930', '27080', '0802', '7019', '9']
2014-01-22 17:22:06,945 - thread 0 - INFO -     6 unigrams: ['ORIGINAL', 'GRAVITY', 'PUBLIC', 'SAN', 'JOSE', 'CA']
2014-01-22 17:22:07,294 - thread 0 - INFO -     0 addresses: []
2014-01-22 17:22:07,295 - thread 0 - INFO - Search components are:
2014-01-22 17:22:07,295 - thread 0 - INFO -     Unigrams: 'ORIGINAL GRAVITY PUBLIC SAN JOSE CA'
2014-01-22 17:22:07,295 - thread 0 - INFO - {"size": 10, "fields": ["BUSINESSSTANDARDNAME", "HOUSE", "PREDIR", "STREET", "STRTYPE", "CITYNAME", "STATE", "ZIP", "pin.location"], "from": 0, "query": {"bool": {"minimum_number_should_match": 1, "should": [{"query_string": {"fields": ["_all^1", "BUSINESSSTANDARDNAME^2"], "query": "ORIGINAL GRAVITY PUBLIC SAN JOSE CA", "boost": 1}}]}}}
2014-01-22 17:22:07,402 - thread 0 - INFO - This system required 238 individual searches.
2014-01-22 17:22:07,403 - thread 0 - INFO - Z-Score delta: [3.182]
2014-01-22 17:22:07,403 - thread 0 - INFO - Top Score Quality: High-grade
[6.698] Original Gravity Public House 66 S 1st St San Jose CA 95113 {'lat': '37.335018', 'lon': '-121.889503'}
[1.895] Gravity Mobile 466 8th St San Francisco CA 94103 {'lat': '37.772671', 'lon': '-122.407845'}
[1.816] Leadership Public Schools-San Jose 1881 Cunningham Ave San Jose CA 95122 {'lat': '37.330371', 'lon': '-121.828946'}
[1.789] Gravity Media 2030 Union St San Francisco CA 94123 {'lat': '37.79757', 'lon': '-122.432796'}
[1.788] Gravity People 147 Natoma St San Francisco CA 94105 {'lat': '37.786098', 'lon': '-122.399918'}
[1.779] City of San Jose Public Works 801 N 1st St San Jose CA 95110 {'lat': '37.350967', 'lon': '-121.903526'}
[1.657] Original Buddhism Society In America 1879 Lundy Ave San Jose CA 95131 {'lat': '37.392794', 'lon': '-121.890645'}
[1.527] Original Joes 1704 Union St San Francisco CA 94123 {'lat': '37.798243', 'lon': '-122.427518'}
[1.482] The Original Pancake House 1366 S De Anza Blvd San Jose CA 95129 {'lat': '37.298752', 'lon': '-122.031654'}
[1.475] The Original Pancake House 2306 Almaden Rd San Jose CA 95125 {'lat': '37.292016', 'lon': '-121.880092'}
Traceback (most recent call last):
  File "/Users/jkey/git/longtail/src/__init__.py", line 9, in <module>
    tokenize_descriptions.start()
  File "/Users/jkey/git/longtail/src/tokenize_descriptions.py", line 102, in start
    tokenize(params, desc_queue)
  File "/Users/jkey/git/longtail/src/tokenize_descriptions.py", line 58, in tokenize
    write_output(params, result_queue)
  File "/Users/jkey/git/longtail/src/tokenize_descriptions.py", line 66, in write_output
    file_name = params["output"]["file"]["path"] or '../data/longtailLabeled.csv'
KeyError: ‘file'
speakerjohnash commented 10 years ago

Fixed.