jeff1evesque / machine-learning

Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)
Other
256 stars 85 forks source link

Implement nosql backend logic #2844

Closed jeff1evesque closed 7 years ago

jeff1evesque commented 7 years ago

After #2842 is resolved, we need to determine the corresponding nosql data structure, and implement it respectively with our python backend logic.

jeff1evesque commented 7 years ago

We were able to attempt a model_predict session, on the web-interface, using a model premised from a collection of svm datasets:

model-predict

However, upon form submission, our flask.log contained the following traceback:

[2017-07-26 07:59:00,986] {/usr/local/lib/python2.7/dist-packages/flask/app.py:1560} ERROR - Exception on /load-data [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/vagrant/interface/views.py", line 125, in load_data
    response = loader.load_model_predict()
  File "/vagrant/brain/load_data.py", line 178, in load_model_predict
    session = ModelPredict(self.data)
  File "/vagrant/brain/session/model_predict.py", line 54, in __init__
    self.model_id = self.prediction_settings['model_id']
KeyError: 'model_id'
jeff1evesque commented 7 years ago

290203d: the current committed code, generates the following:

model-predict-1

model-predict-2

The corresponding error.log:

[2017-07-28 11:36:36,536] {/vagrant/log/logger.py:165} DEBUG - brain.load_data: /brain/load-data.py, session: <brain.session.model_predict.ModelPredict object at 0x7fd63147f850>
[2017-07-28 11:36:36,536] {/vagrant/log/logger.py:165} DEBUG - brain.load_data: /brain/load-data.py, validate_arg_none: False
[2017-07-28 11:36:36,537] {/vagrant/log/logger.py:165} DEBUG - brain.load_data: /brain/load-data.py, session.get_errors(): []
[2017-07-28 11:36:36,538] {/vagrant/log/logger.py:165} DEBUG - brain.session.model_predict: /brain/session/model_predict.py, self.collection: u'collection-1136'
[2017-07-28 11:36:36,539] {/vagrant/log/logger.py:165} DEBUG - brain.session.model_predict: /brain/session/model_predict.py, self.predictors: [u'3', u'3', u'3', u'3', u'3', u'3', u'3']
[2017-07-28 11:36:36,546] {/vagrant/log/logger.py:165} DEBUG - brain.session.model_predict: /brain/session/model_predict.py, model_type: 'svm'
[2017-07-28 11:36:36,549] {/vagrant/log/logger.py:165} DEBUG - brain.load_data: /brain/load-data.py, my_prediction: {'model': 'svm', 'confidence': {'decision_function': [-5.9856473709828322, 7.3016894114551585, 3.4672599258030887], 'classes': [u'dep-variable-1', u'dep-variable-2', u'dep-variable-3'], 'probability': [0.011500715458176457, 0.042140046267255232, 0.94635923827456847]}, 'result': u'dep-variable-2', 'error': None}
jeff1evesque commented 7 years ago

We were able to run an svr case of the above equivalent:

screen shot 2017-07-29 at 10 36 47 am

screen shot 2017-07-29 at 10 37 01 am

However, during a data_new session, we are unable to load xml dataset(s), on the web-interface, for the svr case. So, we'll need to investigate logic relating to xml2dict.py:

screen shot 2017-07-29 at 10 51 22 am

jeff1evesque commented 7 years ago

The model_generate case fails, when larger datasets are used, with the json file format:

While the smaller json datasets succeed:

This means, we'll likely need to investigate accepting larger array instances:

...
        {
            "dependent-variable": "dep-variable-4",
            "independent-variables": [{
                "indep-variable-1": 22.1,
                "indep-variable-2": 95.96,
                "indep-variable-4": 342,
                "indep-variable-5": 66.67,
                "indep-variable-6": 0.001,
                "indep-variable-7": 32,
                "indep-variable-3": 0.743
            },
            {
                "indep-variable-1": 20.71,
                "indep-variable-2": 99.33,
                "indep-variable-4": 342,
                "indep-variable-5": 75.67,
                "indep-variable-6": 0.001,
                "indep-variable-7": 30,
                "indep-variable-3": 0.648
            }]
        },
...

Instead of the single observation instance from the json dataset(s):

      {
            "dependent-variable": "dep-variable-1",
            "independent-variables": [{
                "indep-variable-1": 23.45,
                "indep-variable-2": 98.01,
                "indep-variable-4": 325,
                "indep-variable-5": 54.64,
                "indep-variable-6": 0.002,
                "indep-variable-7": 23,
                "indep-variable-3": 0.432
            }]
        },

Note: the programmatic-interface also implements the former, longer json dataset syntax, which currently has been failing. Solving the above problem, could likely fix the current travis ci builds.

Note: we may remove the gunicorn, and ngnix from our current travis build, since it likely is pointless, and redundant, given how the pytest-flask implements the live_server. So, it would make sense to open a dedicated issue, to create a separate unit test, responsible for checking the configurations of the webserver, and reverse proxy settings, for any arbitrary application.

jeff1evesque commented 7 years ago

731b92a: we are leveraging the travis ci, by raising a ValueError, since the RESTClient firefox plugin, for the osx is buggy, and does not return a response body, when a post request (with application/json header) is sent. Alternative approaches, involve either using a windows host, or adding the certificate to the browser, for this corresponding application.

jeff1evesque commented 7 years ago

Our programmatic-interface, as well as current travis ci builds, have been been failing, because they are referencing datasets, according to the master branch:

{
    "properties": {
        "session_name": "sample_svm_title",
        "collection": "svm-424-5",
        "dataset_type": "dataset_url",
        "session_type": "data_new",
        "model_type": "svm",
        "stream": "True"
    },
    "dataset": [
        "https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svm.json",
        "https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svm-1.json"
    ]
}

However, based on changes worked from this issue (i.e. feature-2844 branch), either the master branch needs to be updated, with datasets from the feature-2844 branch, or we'd have to (temporarily) reference the adjusted datasets:

{
    "properties": {
        "session_name": "sample_svm_title",
        "collection": "svm-424-5",
        "dataset_type": "dataset_url",
        "session_type": "data_new",
        "model_type": "svm",
        "stream": "True"
    },
    "dataset": [
        "https://raw.githubusercontent.com/jeff1evesque/machine-learning/33dbb0fa1e65b7ddb28a7d43919a7843d7f0236b/interface/static/data/json/web_interface/svm.json",
        "https://raw.githubusercontent.com/jeff1evesque/machine-learning/33dbb0fa1e65b7ddb28a7d43919a7843d7f0236b/interface/static/data/json/web_interface/svm-1.json"
    ]
}

Note: this will likely mean that when this issue is initially merged, it will be failing. However, shortly after being merged, we can manually retrigger the travis ci build, to account for the adjusted master branch.

jeff1evesque commented 7 years ago

90e6d38: we should investigate, whether the dataset structure varies, defined within /brain/session/model/sv.py, between the web-interface, and the programmatic-interface.

jeff1evesque commented 7 years ago

036bd3d: our logger debug statement, may suggest that the restructure method, is not properly defining the dataset property, for the web-interface:

dataset-url

This is indicated, by the corresponding output from error.log, when the above form is submitted:

[2017-08-01 21:21:03,613] {/vagrant/log/logger.py:165} DEBUG - brain.session.data.dataset: /brain/session/data/dataset.py, datasets: {'error': None, 'properties': {'stream': False, 'session_type': u'data_new', 'collection': u'collection-file-upload-7', 'dataset_type': u'dataset_url', 'model_type': u'svm', 'session_name': u'test', 'dataset[]': u'https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svm.json'}, 'dataset': None}

So, we'll need to take a closer look, between our /brain/converter/settings.py, and /brain/session/data/dataset.py, by adding appropriate debug logger statements, to the former settings.py.

jeff1evesque commented 7 years ago

We've verified that the web-interface behaves as expected. So, we'll proceed by reviewing the latest travis ci builds, to determine how to resolve current bugs for the programmatic-interface.

Note: the above is a statement, verifying that corresponding logic executed, without raising errors.