Closed jeff1evesque closed 7 years ago
We should additionally write unit tests alongside this issue.
We need to add the pymongo driver, so python can access our mongod instance.
Instead of defining another verbose puppet/environment/vagrant/modules/package/manifests/pymongo.pp
, we will focus on consolidating such an implementation, by tackling #2815.
4c84214: we generalized the installation of the pymongo
driver, per https://github.com/jeff1evesque/machine-learning/issues/2844#issuecomment-292147314. If we need to be more explicit, we can easily revert to a similar implementation of 6d25045, and 5cd9c32.
We need to rewrite each of the converter methods from brain/converter/dataset.py
:
dict
: simple conversion, with no structural data manipulationdict
: we need to ensure the dict
is the same structure as our above json
casedict
: similar to above casesThis means, the corresponding import
needs to be adjusted, respectively. Additionally, we'll need to redefine the get_observation_labels
, and self.count_features
method, such that it's count, and definition is premised around the adjusted dict
object. Once, completed, the save_dataset
method from brain/session/data/dataset.py
will need to be adjusted. Specifically, we need to replace the following, with an implementation to store the corresponding dataset(s), into our mongodb store:
for data in dataset:
for select_data in data['premodel_dataset']:
db_save = Feature({
'premodel_dataset': select_data,
'id_entity': data['id_entity'],
})
# save dataset element, append error(s)
db_return = db_save.save_feature(model_type)
if db_return['error']:
list_error.append(db_return['error'])
However, we may need to encapsulate more useful information, then just the plain dataset(s). More generally, we need more parameters, which will help distinguish one dataset from another:
Therefore, it is likely we need to provide more information from save_premodel_datasest
, located in brain/session/base_data.py
:
# save dataset
response = save_dataset(self.dataset, self.model_type)
442be76: we simplified our assumption by grabbing the first element in the list (i.e. first dict
), and doing a len
on the keys, to retrieve the feature_count
. This simplified assumption, is predicated on the idea that any successive elements in the same, or similar list, will be of the same size.
We need to phase out the following mariadb tables:
tbl_observation_label
tbl_feature_count
Instead of creating an explicit sql construct, we can directly access a random node from a mongodb document, for a specified session (i.e. data_new
, of data_append
instance). Therefore, we can assume any mongodb document, for a given session, is properly structured, since it should only exists in the mongodb document, if it had passed an earlier collection validator, prior to database ingestion. So, we are allowed to arbitrarily choose any element from a document, with respect to a desired session instance, and obtain any information, such as a unique list of observation labels, or the feature count.
Note: the collection validator will be implemented per #2986.
c413e1b: we should proceed by verifying that the following lines correctly execute:
...
converter = Dataset(dataset, model_type, True)
converted = converter.json_to_dict()
...
9a649a6: we need to ensure that save_premodel_dataset
, properly stores corresponding dataset(s), via the data_new
session, into its mongodb collection.
0280bf8: we need to define the save_collection
method.
Some of our recent python code, has contributed to a 502
bad gateway error:
This most likely means that bad python code, is breaking our gunicorn web server(s). As a result, our reverse proxy (i.e. nginx) is unable to direct traffic. However, when we switch to the master
branch, followed by vagrant destroy -f && vagrant up
, the browser renders the application, as well as the POST
api being able to execute requests, as expected.
Note: we needed to cleared the browser cache, to satisfy both firefox's restclient, as well as the general web-interface, when verifying that the master
branch was still functional. It may have been likely, that a vagrant halt
would have sufficed, rather than a full rebuild. But, this was not tested.
de3168f: we need to adjust our feature_request
implementation, from sv.py
.
It seems our travis ci, finds a KeyError
, when trying to yaml.load
the nosql configurations, from the database.yaml
. So, we've replicated the corresponding statements manually:
>>> import yaml
>>> with open('database.yaml', 'r') as stream:
... settings = yaml.load(stream)
... sql = settings['database']['mariadb']
... nosql = settings['database']['mongodb']
...
>>> print sql
{'username': 'authenticated', 'name': 'db_machine_learning', 'tester': 'tester', 'log_path': '/log/database', 'provisioner_password': 'password', 'host': 'localhost', 'root_password': 'password', 'provisioner': 'provisioner', 'tester_password': 'password', 'password': 'password'}
>>> print nosql
{'username': 'authenticated', 'password': 'password', 'name': 'dataset', 'storage': {'journal': {'enabled': True}, 'dbPath': ['/data', '/data/db']}, 'host': 'localhost', 'systemLog': {'verbosity': 1, 'destination': 'file', 'logAppend': True, 'systemLogPath': '/var/log/mongodb/mongod.log'}, 'net': {'bindIp': '127.0.0.1', 'port': 27017}, 'processManagement': {'fork': True, 'pidfilepath': '/var/run/mongodb/mongod.pid'}}
>>> print sql['host']
localhost
>>> print nosql['host']
localhost
Given the above traceback, it seems fair to assume that our approach is not unreasonable. Instead, we need to find out the syntax limitation, within the overall factory.py
.
603b965: our above comment resulted in the mysterious KeyError
, since puppet's hiera implementation, required corresponding mongodb definitions, which were properly set for the vagrant development environment, and not for the docker unit test environment, implemented by the travis ci.
Our programmatic-api currently generate a 500
error, upon the data_new
session:
While our web-interface, similarly generates a 500
error:
So, we inspected our mongodb immediately after, to discover nothing was stored:
root@trusty64:/home/vagrant# mongo
MongoDB shell version v3.4.4
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.4
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
Server has startup warnings:
2017-05-23T20:57:46.162-0400 I STORAGE [initandlisten]
2017-05-23T20:57:46.162-0400 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2017-05-23T20:57:46.162-0400 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten]
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten]
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten]
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten]
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2017-05-23T20:57:50.093-0400 I CONTROL [initandlisten]
> show dbs
admin 0.000GB
local 0.000GB
> show collections
> show users
This means, we'll need temporarily implement our Logger
class, to determine what part of code needs to be modified. Additionally, our travis ci may not be useful, since there currently is no mongodb instance, for our pymongo
connector to execute operations against, for the docker
puppet environment (i.e. unit test build). This means, we'll need to create another docker container, to be used by the corresponding unit tests.
Note: it is not unlikely, that due to the size of this issue, that we may partition the corresponding unit tests (i.e. docker build for mongodb), as a separate issue.
We may need to add something of the following within mongodb/manifests/run.pp
:
## create admin user
exec { 'create-admin-user':
command => dos2unix(template('mongodb/create-user.erb')),
onlyif => dos2unix(template('mongodb/check-user.erb')),
notify => Service['upstart-mongod'],
}
This will likely entail the need to make mongod.conf.erb
restart friendly. Therefore, we may need to remove the following, and somehow track the associated pid
, so it can be restarted:
## restart upstart job continuously
respawn
We can use the following to test for user existence:
vagrant@trusty64:~$ TEMP=$(mongo --eval "db.getUser('admin')"); echo $TEMP
MongoDB shell version v3.4.4 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.4.4 { "_id" : "test.admin", "user" : "admin", "db" : "test", "customData" : { "uid" : 1 }, "roles" : [ { "role" : "clusterAdmin", "db" : "admin" }, { "role" : "readWriteAnyDatabase", "db" : "admin" }, { "role" : "userAdminAnyDatabase", "db" : "admin" }, { "role" : "dbAdminAnyDatabase", "db" : "admin" }, { "role" : "readWrite", "db" : "test" } ] }
vagrant@trusty64:~$ TEMP=$(mongo --eval "db.getUser('admin')" --quiet); echo $TEMP
{ "_id" : "test.admin", "user" : "admin", "db" : "test", "customData" : { "uid" : 1 }, "roles" : [ { "role" : "clusterAdmin", "db" : "admin" }, { "role" : "readWriteAnyDatabase", "db" : "admin" }, { "role" : "userAdminAnyDatabase", "db" : "admin" }, { "role" : "dbAdminAnyDatabase", "db" : "admin" }, { "role" : "readWrite", "db" : "test" } ] }
vagrant@trusty64:~$ TEMP=$(mongo --eval "db.getUser('adminf')" --quiet); echo $TEMP
null
Additionally, we can use the following to create users:
vagrant@trusty64:~$ TEMP=$(mongo --eval "db.createUser( { user: 'jeff1evesque', pwd: 'password', customData: { uid: 1 }, roles: [ { role: 'clusterAdmin', db: 'admin' }, { role: 'readWriteAnyDatabase', db: 'admin' }, { role: 'userAdminAnyDatabase', db: 'admin' }, { role: 'dbAdminAnyDatabase', db: 'admin' }] }, { w: 'majority' , wtimeout: 5000 } )" --quiet); echo $TEMP
Successfully added user: { "user" : "jeff1evesque", "customData" : { "uid" : 1 }, "roles" : [ { "role" : "clusterAdmin", "db" : "admin" }, { "role" : "readWriteAnyDatabase", "db" : "admin" }, { "role" : "userAdminAnyDatabase", "db" : "admin" }, { "role" : "dbAdminAnyDatabase", "db" : "admin" } ] }
So, we'll need to bootstrap the above into puppet logic. Though, its possible to use file
to create an executable file, which could be implemented by puppet's exec
directive, it may be better to simply write two erb templates, and execute them directly within a single exec
, containing an onlyif
condition.
We most likely have syntax errors in our python code, which is why our travis ci is now failing the registration unit test, and was succeeding (i.e. 28f0c82) prior to splitting the mariadb docker container, into two containers (i.e. one for mariadb, another for mongodb).
After a fresh vagrant up
build, we noticed that our puppet implementation succeeded, with provisioning our mongodb authenticated
user:
$ vagrant ssh
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-31-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information disabled due to load higher than 1.0
Last login: Thu May 25 00:47:19 2017
vagrant@trusty64:~$ TEMP=$(mongo --eval "db.getUser('admin')" --quiet); echo $TEMP
null
vagrant@trusty64:~$ TEMP=$(mongo --eval "db.getUser('authenticated')" --quiet); echo $TEMP
{ "_id" : "test.authenticated", "user" : "authenticated", "db" : "test", "roles" : [ { "role" : "clusterAdmin", "db" : "admin" }, { "role" : "readWriteAnyDatabase", "db" : "admin" }, { "role" : "userAdminAnyDatabase", "db" : "admin" }, { "role" : "dbAdminAnyDatabase", "db" : "admin" } ] }
However, our travis ci was not able to reach the same level of success. Specifically, our travis ci unit test implementation, connect failed
. So, we'll need to either determine how to properly start our mongodb, or how to configure the necessary bind ip, and related settings, within the corresponding docker container.
We were able to check that the mongod port 27017 on the mongodb
container, was open by using the nmap
command from the redis
container:
vagrant@trusty64:/vagrant/test$ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0813f1c5ce59 container-webserver "python app.py test" 6 hours ago Exited (0) 6 hours ago webserver-pytest
30a5eb2bfd4f container-mariadb "/bin/sh -c mysqld" 6 hours ago Up 6 hours mariadb
dfd46b76e6c3 container-webserver "python app.py run" 6 hours ago Exited (1) 6 hours ago webserver
120f76f1c1a8 container-mongodb "/bin/sh -c mongod..." 6 hours ago Up 6 hours mongodb
185aeb1587ab container-redis "/bin/sh -c redis-..." 6 hours ago Up 6 hours redis
31c41ab39585 container-default "/bin/bash" 6 hours ago Exited (0) 6 hours ago base
vagrant@trusty64:/vagrant/test$ sudo docker exec -it redis sudo nmap -p 27017 mongodb
Starting Nmap 6.40 ( http://nmap.org ) at 2017-06-02 07:48 EDT
Nmap scan report for mongodb (172.18.0.2)
Host is up (0.00014s latency).
rDNS record for 172.18.0.2: mongodb.app_nw
PORT STATE SERVICE
27017/tcp open unknown
MAC Address: 02:42:AC:12:00:02 (Unknown)
Nmap done: 1 IP address (1 host up) scanned in 0.47 seconds
Note: we manually installed nmap
in the redis
container via docker exec
.
We were able to telnet
from the webserver
container to the mongodb
container:
vagrant@trusty64:/vagrant/test$ sudo docker exec -it webserver sudo apt-get install -y telnet
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
telnet
0 upgraded, 1 newly installed, 0 to remove and 3 not upgraded.
Need to get 67.1 kB of archives.
After this operation, 167 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu/ trusty/main telnet amd64 0.17-36build2 [67.1 kB]
Fetched 67.1 kB in 0s (175 kB/s)
Selecting previously unselected package telnet.
(Reading database ... 38442 files and directories currently installed.)
Preparing to unpack .../telnet_0.17-36build2_amd64.deb ...
Unpacking telnet (0.17-36build2) ...
Setting up telnet (0.17-36build2) ...
update-alternatives: using /usr/bin/telnet.netkit to provide /usr/bin/telnet (telnet) in auto mode
vagrant@trusty64:/vagrant/test$ sudo docker exec -it webserver sudo telnet mongodb 27017
Trying 172.18.0.3...
Connected to mongodb.
Escape character is '^]'.
Additionally, we were able to ping
the mongodb
container from the webserver
container:
vagrant@trusty64:/vagrant/test$ sudo docker exec -it webserver sudo ping mongodb
PING mongodb (172.18.0.3) 56(84) bytes of data.
64 bytes from mongodb.app_nw (172.18.0.3): icmp_seq=1 ttl=64 time=0.056 ms
64 bytes from mongodb.app_nw (172.18.0.3): icmp_seq=2 ttl=64 time=0.064 ms
64 bytes from mongodb.app_nw (172.18.0.3): icmp_seq=3 ttl=64 time=0.087 ms
64 bytes from mongodb.app_nw (172.18.0.3): icmp_seq=4 ttl=64 time=0.086 ms
64 bytes from mongodb.app_nw (172.18.0.3): icmp_seq=5 ttl=64 time=0.060 ms
^C
--- mongodb ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3998ms
rtt min/avg/max/mdev = 0.056/0.070/0.087/0.016 ms
01ae0a7: we can start to phase out puppet, and provision the docker containers with dockerish syntax. We've verified that we were able to provision users in the docker
unit test environment:
vagrant@trusty64:/vagrant/test$ ./unit-tests
...
Step 7/7 : ENTRYPOINT python app.py
---> Running in 4af7bae49a07
---> c62331dcbb1f
Removing intermediate container 4af7bae49a07
Successfully built c62331dcbb1f
7d0c5e17b58c6b1df24600d02274f3d13b52f684a3f14b6b7e3f65cfd13d365c
2f72b86e08ec3568f3ba2099d3096758d0e759bf9e7c80714aaa43f0424b2e36
2019e2afc6be804a67899284dd53372d0d1793ff9f989e5d098024041c68621a
6624afb68403813398ef8dab2e144d32b06f6e2bd6a02249c179222567a48b6b
032b70f48a1303710f8794446fc47a7f12f2362346f9cc74e6d24f16696f8d4a
765275b3221a3d64f395f719d3347f3d30696a51f06c370d5583f7c651c1c64d
Successfully added user: {
"user" : "jeff1evesque",
"roles" : [
{
"role" : "clusterAdmin",
"db" : "admin"
},
{
"role" : "readWriteAnyDatabase",
"db" : "admin"
},
{
"role" : "userAdminAnyDatabase",
"db" : "admin"
},
{
"role" : "dbAdminAnyDatabase",
"db" : "admin"
}
]
}
============================= test session starts ==============================
platform linux2 -- Python 2.7.6, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
rootdir: /var/machine-learning/test/live_server, inifile: pytest.ini
plugins: flask-0.10.0, cov-2.4.0
collected 28 items
...
Note: we should attempt to figure out how to make our corresponding dockerfile(s) more dynamic, by referencing our yaml configurations.
Our above recent commits, was able to provision the jeff1evesque
user within the mongodb database:
vagrant@trusty64:/vagrant/test$ ./unit-tests
...
Successfully built abcb8c09382e
e5671ea89b4f74f4b067bdf5f4fd1a15e317ed6c513154af6f4ee986f0b864d4
e91df9d4beb9362690d0bd9294e036726fbbd69c8d8a027e88e1c9a240c71c49
b98ba76c9da47b25c1d20d086c5e605758e3a13af0ebd473330e159978e77e6e
1f3197865885055d3d41518df37de2707441861d6ea75e7ddeb99f8a9e420b54
46f98dbfb3dc4549a486ccb9123b1377ef588e07d167743f80a2b7457a46112f
75bfc45c0d946c44bcea873aef88cc7253f6d97c50790244257583055536ef66
Successfully added user: {
"user" : "jeff1evesque",
"roles" : [
{
"role" : "clusterAdmin",
"db" : "admin"
},
{
"role" : "readWriteAnyDatabase",
"db" : "admin"
},
{
"role" : "userAdminAnyDatabase",
"db" : "admin"
},
{
"role" : "dbAdminAnyDatabase",
"db" : "admin"
}
]
}
============================= test session starts ==============================
...
vagrant@trusty64:/vagrant/test$ sudo docker exec -it mongodb sudo netstat -ntlup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.11:39459 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:27017 0.0.0.0:* LISTEN 1/mongod
udp 0 0 127.0.0.11:58971 0.0.0.0:* -
vagrant@trusty64:/vagrant/test$ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
18340bef3d53 container-webserver "python app.py test" 15 minutes ago Exited (0) 13 minutes ago webserver-pytest
f701e6571918 container-mariadb "/bin/sh -c mysqld" 15 minutes ago Up 15 minutes mariadb
1c4f94c7e34a container-webserver "python app.py run" 15 minutes ago Up 15 minutes webserver
05f710501f10 container-mongodb "/usr/bin/mongod -..." 15 minutes ago Up 15 minutes mongodb
11926a267118 container-redis "/bin/sh -c redis-..." 15 minutes ago Up 15 minutes redis
0c368b4405b0 container-default "/bin/bash" 15 minutes ago Exited (0) 15 minutes ago base
vagrant@trusty64:/vagrant/test$ sudo docker exec -it mongodb mongo --eval 'db.getUsers()'
MongoDB shell version: 3.2.14
connecting to: test
[
{
"_id" : "test.jeff1evesque",
"user" : "jeff1evesque",
"db" : "test",
"roles" : [
{
"role" : "clusterAdmin",
"db" : "admin"
},
{
"role" : "readWriteAnyDatabase",
"db" : "admin"
},
{
"role" : "userAdminAnyDatabase",
"db" : "admin"
},
{
"role" : "dbAdminAnyDatabase",
"db" : "admin"
}
]
}
]
231cfb4: we'll need to restart the mongod
process, after we createUser
, for both the vagrant
build, as well as the docker
unit test build.
We temporarily amended (not committed) our unit-tests
with the following:
...
## provision mongodb authorization
sudo docker exec -it mongodb sudo mongo mongodb://mongodb:27017 --eval "db.getSiblingDB('admin'); db.createUser({\
user: 'jeff1evesque',\
pwd: 'password',\
roles: [\
{role: 'clusterAdmin', db: 'admin' },\
{role: 'readWriteAnyDatabase', db: 'admin' },\
{role: 'userAdminAnyDatabase', db: 'admin' },\
{role: 'dbAdminAnyDatabase', db: 'admin' }\
]},\
{ w: 'majority' , wtimeout: 5000 } )" --quiet
sudo docker exec -it mongodb sudo cat /etc/mongod.conf
sudo docker exec -it mongodb sudo ps -eo pid,cmd,lstart
echo '================================================='
sudo docker exec -it mongodb sudo sed -i "/#[[:space:]]*security:/s/^#//g" /etc/mongod.conf
sudo docker exec -it mongodb sudo sed -i "/#[[:space:]]*authorization:[[:space:]]*enabled/s/^#//g" /etc/mongod.conf
echo '================================================='
sudo docker exec -it mongodb sudo cat /etc/mongod.conf
sudo docker exec -it mongodb sudo ps -eo pid,cmd,lstart
sudo docker restart mongodb
echo '================================================='
sudo docker exec -it mongodb sudo cat /etc/mongod.conf
sudo docker exec -it mongodb sudo ps -eo pid,cmd,lstart
...
Upon running the ./unit-tests
in our vagrant environment:
vagrant@trusty64:/vagrant/test$ ./unit-tests
...
Successfully added user: {
"user" : "jeff1evesque",
"roles" : [
{
"role" : "clusterAdmin",
"db" : "admin"
},
{
"role" : "readWriteAnyDatabase",
"db" : "admin"
},
{
"role" : "userAdminAnyDatabase",
"db" : "admin"
},
{
"role" : "dbAdminAnyDatabase",
"db" : "admin"
}
]
}
## mongodb.conf, this file is enforced by puppet.
##
## Note: http://docs.mongodb.org/manual/reference/configuration-options/
##
## where and how to store data.
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
## where to write logging data.
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
## network interfaces
net:
port: 27017
bindIp: 0.0.0.0
## role-based access controls
#security:
# authorization: enabled
PID CMD STARTED
1 /usr/bin/mongod -f /etc/mon Mon Jun 19 00:00:37 2017
34 sudo ps -eo pid,cmd,lstart Mon Jun 19 00:00:41 2017
37 ps -eo pid,cmd,lstart Mon Jun 19 00:00:41 2017
=================================================
=================================================
## mongodb.conf, this file is enforced by puppet.
##
## Note: http://docs.mongodb.org/manual/reference/configuration-options/
##
## where and how to store data.
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
## where to write logging data.
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
## network interfaces
net:
port: 27017
bindIp: 0.0.0.0
## role-based access controls
security:
authorization: enabled
PID CMD STARTED
1 /usr/bin/mongod -f /etc/mon Mon Jun 19 00:00:37 2017
53 sudo ps -eo pid,cmd,lstart Mon Jun 19 00:00:41 2017
57 ps -eo pid,cmd,lstart Mon Jun 19 00:00:41 2017
mongodb
=================================================
## mongodb.conf, this file is enforced by puppet.
##
## Note: http://docs.mongodb.org/manual/reference/configuration-options/
##
## where and how to store data.
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
## where to write logging data.
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
## network interfaces
net:
port: 27017
bindIp: 0.0.0.0
## role-based access controls
security:
authorization: enabled
PID CMD STARTED
1 /usr/bin/mongod -f /etc/mon Mon Jun 19 00:00:42 2017
11 sudo ps -eo pid,cmd,lstart Mon Jun 19 00:00:43 2017
15 ps -eo pid,cmd,lstart Mon Jun 19 00:00:43 2017
...
We notice the mongod
start time, for the corresponding pid
changed. So, we'll need to check if our pymongo
implementation properly authenticates to the mongod
process.
We were able to connect to our mongodb via the mongo
shell command:
vagrant@trusty64:/vagrant/test$ sudo docker exec -it mongodb mongo --port 27017 -u authenticated -p password
MongoDB shell version: 3.2.14
connecting to: 127.0.0.1:27017/test
Server has startup warnings:
2017-06-20T08:20:10.516-0400 I CONTROL [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2017-06-20T08:20:10.516-0400 I CONTROL [initandlisten]
2017-06-20T08:20:10.516-0400 I CONTROL [initandlisten]
2017-06-20T08:20:10.516-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2017-06-20T08:20:10.517-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2017-06-20T08:20:10.517-0400 I CONTROL [initandlisten]
2017-06-20T08:20:10.517-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2017-06-20T08:20:10.517-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2017-06-20T08:20:10.517-0400 I CONTROL [initandlisten]
However, the corresponding snippet from query.py
:
# single mongodb instance
self.client = MongoClient(
"mongodb://{user}:{pass}@{host}/admin".format(**self.args)
)
self.database = self.client[self.args['db']]
self.collection = self.database[collection]
generates the following errors, contained within /var/log/mongodb/mongod.log
:
2017-06-20T08:22:01.782-0400 I ACCESS [conn8] SCRAM-SHA-1 authentication failed for authenticated on admin from client 172.18.0.6 ; UserNotFound: Could not find user authenticated@admin
We created a corresponding question in the stackoverflow forum, and will proceed next by ensuring the /var/run/mongod.pid
file, is defined in the /etc/mongod.conf
configuration.
Our manual unit tests, now has a little more success:
root@trusty64:/vagrant/test# ./unit-tests
...
============================= test session starts ==============================
platform linux2 -- Python 2.7.6, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
rootdir: /var/machine-learning/test/live_server, inifile: pytest.ini
plugins: flask-0.10.0, cov-2.4.0
collected 28 items
test/live_server/authentication/pytest_account_registration.py .
test/live_server/authentication/pytest_crypto.py .
test/live_server/authentication/pytest_user_login.py .
test/live_server/authentication/pytest_user_logout.py .
test/live_server/authentication/pytest_validate_password.py .
test/live_server/programmatic_interface/dataset_url/pytest_svm_dataset_url.py ..FF
test/live_server/programmatic_interface/dataset_url/pytest_svr_dataset_url.py ..FF
test/live_server/programmatic_interface/file_upload/pytest_svm_file_upload.py ..FF
test/live_server/programmatic_interface/file_upload/pytest_svr_file_upload.py FFFF
test/live_server/programmatic_interface/results/pytest_1_svm_prediction.py ...
test/live_server/programmatic_interface/results/pytest_2_svr_prediction.py ...
test/live_server/programmatic_interface/results/pytest_3_all_prediction_titles.py .
...
self = SocketInfo(<socket._socketobject object at 0x7f52d74a68a0>) CLOSED at 139993749029008
error = InvalidDocument("key '31.111' must not contain '.'",)
def _raise_connection_failure(self, error):
# Catch *all* exceptions from socket methods and close the socket. In
# regular Python, socket operations only raise socket.error, even if
# the underlying cause was a Ctrl-C: a signal raised during socket.recv
# is expressed as an EINTR error from poll. See internal_select_ex() in
# socketmodule.c. All error codes from poll become socket.error at
# first. Eventually in PyEval_EvalFrameEx the interpreter checks for
# signals and throws KeyboardInterrupt into the current frame on the
# main thread.
#
# But in Gevent and Eventlet, the polling mechanism (epoll, kqueue,
# ...) is called in Python code, which experiences the signal as a
# KeyboardInterrupt from the start, rather than as an initial
# socket.error, so we catch that, close the socket, and reraise it.
self.close()
if isinstance(error, socket.error):
_raise_connection_failure(self.address, error)
else:
> raise error
E InvalidDocument: key '31.111' must not contain '.'
/usr/local/lib/python2.7/dist-packages/pymongo/pool.py:552: InvalidDocument
However, the above traceback indicates that we'll need to either restructure our dataset(s), or create a mechanism allowing massaged data to be stored. Additionally, all cases of the model_generate
, as well as the model_predict
sessions, will need to be reworked.
Additionally, we have verified that our insert equivalent commands are storing data:
vagrant@trusty64:/vagrant/test$ sudo docker exec -it mongodb mongo admin --port 27017 -u authenticated -p password
MongoDB shell version: 3.2.14
connecting to: 127.0.0.1:27017/admin
> use dataset
switched to db dataset
> show collections
supervised.posts
> var collections = db.getCollectionNames();
> for (var i = 0; i< collections.length; i++) { print('Collection: ' + collections[i]); db.getCollection(collections[i]).find().forEach(printjson); }
Collection: supervised.posts
{
"_id" : ObjectId("595044e60a50bc00010645f6"),
"data" : {
"dataset" : {
"file_upload" : null,
"json_string" : [
"https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svm.json",
"https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svm-1.json"
],
"upload_quantity" : 1
},
"settings" : {
"model_type" : "svm",
"session_name" : "sample_svm_title",
"dataset_type" : "dataset_url",
"session_type" : "data_new"
}
},
"error" : null
}
{
"_id" : ObjectId("595044e80a50bc00010645f8"),
"data" : {
"dataset" : {
"file_upload" : null,
"json_string" : [
"https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svm.json",
"https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svm-1.json"
],
"upload_quantity" : 1
},
"settings" : {
"model_type" : "svm",
"dataset_type" : "dataset_url",
"session_id" : "1",
"session_type" : "data_append"
}
},
"error" : null
}
{
"_id" : ObjectId("595044f20a50bc00010645fa"),
"data" : {
"dataset" : {
"file_upload" : null,
"json_string" : [
"https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svr.json",
"https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svr-1.json"
],
"upload_quantity" : 1
},
"settings" : {
"model_type" : "svr",
"session_name" : "sample_svr_title",
"dataset_type" : "dataset_url",
"session_type" : "data_new"
}
},
"error" : null
}
{
"_id" : ObjectId("595044f40a50bc00010645fc"),
"data" : {
"dataset" : {
"file_upload" : null,
"json_string" : [
"https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svr.json",
"https://raw.githubusercontent.com/jeff1evesque/machine-learning/master/interface/static/data/json/web_interface/svr-1.json"
],
"upload_quantity" : 1
},
"settings" : {
"model_type" : "svr",
"dataset_type" : "dataset_url",
"session_id" : "2",
"session_type" : "data_append"
}
},
"error" : null
}
{
"_id" : ObjectId("595044fd0a50bc00010645fe"),
"data" : {
"dataset" : {
"file_upload" : null,
"json_string" : {
"dep-variable-5" : [
{
"indep-variable-6" : 0.001,
"indep-variable-7" : 27,
"indep-variable-4" : 295,
"indep-variable-5" : 55.83,
"indep-variable-2" : 95.03,
"indep-variable-3" : 0.488,
"indep-variable-1" : 23.27
},
{
"indep-variable-6" : 0.001,
"indep-variable-7" : 27,
"indep-variable-4" : 295,
"indep-variable-5" : 55.83,
"indep-variable-2" : 95.03,
"indep-variable-3" : 0.488,
"indep-variable-1" : 23.27
},
{
"indep-variable-6" : 0.001,
"indep-variable-7" : 29,
"indep-variable-4" : 303,
"indep-variable-5" : 58.88,
"indep-variable-2" : 97.78,
"indep-variable-3" : 0.638,
"indep-variable-1" : 19.99
}
],
"dep-variable-4" : [
{
"indep-variable-6" : 0.001,
"indep-variable-7" : 32,
"indep-variable-4" : 342,
"indep-variable-5" : 66.67,
"indep-variable-2" : 95.96,
"indep-variable-3" : 0.743,
"indep-variable-1" : 22.1
},
{
"indep-variable-6" : 0.001,
"indep-variable-7" : 30,
"indep-variable-4" : 342,
"indep-variable-5" : 75.67,
"indep-variable-2" : 99.33,
"indep-variable-3" : 0.648,
"indep-variable-1" : 20.71
}
],
"dep-variable-1" : [
{
"indep-variable-6" : 0.002,
"indep-variable-7" : 23,
"indep-variable-4" : 325,
"indep-variable-5" : 54.64,
"indep-variable-2" : 98.01,
"indep-variable-3" : 0.432,
"indep-variable-1" : 23.45
}
],
"dep-variable-3" : {
"indep-variable-6" : 0.002,
"indep-variable-7" : 26,
"indep-variable-4" : 427,
"indep-variable-5" : 75.45,
"indep-variable-2" : 101.21,
"indep-variable-3" : 0.832,
"indep-variable-1" : 22.67
}
},
"upload_quantity" : 1
},
"settings" : {
"model_type" : "svm",
"session_name" : "sample_svm_title",
"dataset_type" : "file_upload",
"session_type" : "data_new"
}
},
"error" : null
}
{
"_id" : ObjectId("595044ff0a50bc0001064600"),
"data" : {
"dataset" : {
"file_upload" : null,
"json_string" : {
"dep-variable-1" : [
{
"indep-variable-6" : 0.002,
"indep-variable-7" : 25,
"indep-variable-4" : 325,
"indep-variable-5" : 54.64,
"indep-variable-2" : 98.01,
"indep-variable-3" : 0.432,
"indep-variable-1" : 23.45
}
],
"dep-variable-3" : [
{
"indep-variable-6" : 0.002,
"indep-variable-7" : 24,
"indep-variable-4" : 427,
"indep-variable-5" : 75.45,
"indep-variable-2" : 101.21,
"indep-variable-3" : 0.832,
"indep-variable-1" : 22.67
}
],
"dep-variable-2" : [
{
"indep-variable-6" : 0.001,
"indep-variable-7" : 31,
"indep-variable-4" : 235,
"indep-variable-5" : 64.45,
"indep-variable-2" : 92.22,
"indep-variable-3" : 0.356,
"indep-variable-1" : 24.32
}
]
},
"upload_quantity" : 1
},
"settings" : {
"model_type" : "svm",
"dataset_type" : "file_upload",
"session_id" : "3",
"session_type" : "data_append"
}
},
"error" : null
}
We need to restructure all our sample dataset(s), by ensuring no key values contain actual values, as the case with the above 31.111
, on an svr data_new
session. So, we'll need to make these adjustments for both classification, and regression based calculations, and make necessary adjustments to our documentation.
e179784: it is likely that the application will need to have many database writes. So, it seems more logical to use a single connection to serve this purpose, instead of continuously opening, and closing connections. Having to continuously open, and close connections, would be expensive on system resources.
Additionally, we'll need to reconsider the need for the data_append
session. Since we are refactoring with nosql, we will likely consider enforcing a particular key value, in the json file, which binds all json files, to be collectively used, to generate a corresponding model. This seems like a better solution, than relying on a single json file, which can be appended to an infinite number of times. Specifically, mongodb by default, has about a 16MB
limit, for a given json file. This means, we'll likely take away the data_append
session, since it will not be compatible with our nosql implementation - especially, if later it is distributed.
We should readjust our flask variable implementation with flask's built-in database connection management:
Our earlier comment is not fully accurate. Specifically, each particular study, will contain it's own mongodb collection, within the mongodb database. This means if some sensor1 is responsible for collecting data, on a determined interval, to be used for a defined computation(s), then each successive time the device streams data, it will store the corresponding json documents, in the same database collection. So, the mongodb collection(s) will partition each study, by grouping corresponding documents into collections. Additionally, all collections will be contained within the same overall database. This will allow one study to leverage json documents, from another study, if permissions have been properly granted.
ce76476: we need to define a corresponding form element, to capture the collection
information. This means, we'll need to adjust our corresponding jsx presentation.
We need to rework, and possibly reduce the following views.py
logic, for the web-interface:
...
# web-interface: get submitted form data
if request.form:
settings = request.form
sender = Settings(settings, files)
data_formatted = sender.restructure()
# send reformatted data to brain
loader = Load_Data(data_formatted)
...
Specifically, the argument supplied to the Load_Data
class, should be of the same structure, for both interfaces. This means, we'll need to note of the structure supplied for the programmatic-interface, and make the corresponding adjustments, for the web-interface equivalent.
85e6d2b: we were able to verify that our list comprehension is implemented as expected:
vagrant@trusty64:~$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> myDict = {'b': 3, 'a': 5, 'c': 1}
>>> [v for k, v in sorted(myDict.items())]
[5, 3, 1]
>>> [k for k, v in sorted(myDict.items())]
['a', 'b', 'c']
We need to determine how to properly adjust, our label encoder from sv.py
:
# generate svm model
if model == list_model_type[0]:
# convert observation labels to a unique integer representation
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(dataset[:, 0])
encoded_labels = label_encoder.transform(observation_labels)
f0d3ed5: we implemented the LabelEncoder
, using the following, to ensure unique label fitting:
vagrant@trusty64:~$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> t = [1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> t
[1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> list(set(t))
[1, 2, 3, 5, 6, 7, 8]
The following will verify if the authenticated
user can authenticate against on the admin
table:
mongo admin --port 27017 -u authenticated -p password
5581017: dataset['result']
will contain the <pymongo.cursor.Cursor object at 0x7feaf25b5390>
object, within our sv.py
. So, we'll need to consider using mongo's aggregation implementation. Later, if distributed clustering is something of interest, we could implement either aggregation pipelines, or consider integrating hadoop with mongodb.
The following sv.py
snippet:
...
# restructure dataset into arrays
observation_labels = []
grouped_features = []
for dataset in datasets['result']:
logger = Logger(__name__, 'error', 'error')
logger.log('sv.py, dataset: ' + repr(dataset))
for observation in dataset['dataset']:
logger.log('sv.py, observation: ' + repr(observation))
observation_labels.append(observation['dependent-variable'])
indep_variables = observation['independent-variables']
logger.log('sv.py, indep_variables: ' + repr(indep_variables))
for features in indep_variables:
sorted_features = [v for k, v in sorted(features.items())]
grouped_features.append(sorted_features)
logger.log('sv.py, grouped_features: ' + repr(grouped_features))
if not sorted_labels:
sorted_labels = [k for k, v in sorted(features.items())]
logger.log('sv.py, sorted_labels: ' + repr(sorted_labels))
# generate svm model
...
generates an error.log
, for the web-interface:
[2017-07-19 08:16:39,037] {/vagrant/log/logger.py:165} DEBUG - brain.database.dataset: brain/database/dataset.py, collection: u'test-756'
[2017-07-19 08:16:39,037] {/vagrant/log/logger.py:165} DEBUG - brain.database.dataset: brain/database/dataset.py, operation: 'aggregate'
[2017-07-19 08:16:39,037] {/vagrant/log/logger.py:165} DEBUG - brain.database.dataset: brain/database/dataset.py, payload: [{'$project': {'dataset': 1}}]
[2017-07-19 08:16:39,152] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, dataset: {u'_id': ObjectId('596f48df9bd56c083a84bec0'), u'dataset': [{u'dependent-variable': u'dep-variable-1', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 25, u'indep-variable-4': 325, u'indep-variable-5': 54.64, u'indep-variable-2': 98.01, u'indep-variable-3': 0.432, u'indep-variable-1': 23.45}]}, {u'dependent-variable': u'dep-variable-2', u'independent-variables': [{u'indep-variable-6': 0.001, u'indep-variable-7': 31, u'indep-variable-4': 235, u'indep-variable-5': 64.45, u'indep-variable-2': 92.22, u'indep-variable-3': 0.356, u'indep-variable-1': 24.32}]}, {u'dependent-variable': u'dep-variable-3', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 24, u'indep-variable-4': 427, u'indep-variable-5': 75.45, u'indep-variable-2': 101.21, u'indep-variable-3': 0.832, u'indep-variable-1': 22.67}]}]}
[2017-07-19 08:16:39,152] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, observation: {u'dependent-variable': u'dep-variable-1', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 25, u'indep-variable-4': 325, u'indep-variable-5': 54.64, u'indep-variable-2': 98.01, u'indep-variable-3': 0.432, u'indep-variable-1': 23.45}]}
[2017-07-19 08:16:39,152] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, indep_variables: [{u'indep-variable-6': 0.002, u'indep-variable-7': 25, u'indep-variable-4': 325, u'indep-variable-5': 54.64, u'indep-variable-2': 98.01, u'indep-variable-3': 0.432, u'indep-variable-1': 23.45}]
[2017-07-19 08:16:39,153] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, grouped_features: [[23.45, 98.01, 0.432, 325, 54.64, 0.002, 25]]
[2017-07-19 08:16:39,153] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, sorted_labels: [u'indep-variable-1', u'indep-variable-2', u'indep-variable-3', u'indep-variable-4', u'indep-variable-5', u'indep-variable-6', u'indep-variable-7']
[2017-07-19 08:16:39,153] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, observation: {u'dependent-variable': u'dep-variable-2', u'independent-variables': [{u'indep-variable-6': 0.001, u'indep-variable-7': 31, u'indep-variable-4': 235, u'indep-variable-5': 64.45, u'indep-variable-2': 92.22, u'indep-variable-3': 0.356, u'indep-variable-1': 24.32}]}
[2017-07-19 08:16:39,154] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, indep_variables: [{u'indep-variable-6': 0.001, u'indep-variable-7': 31, u'indep-variable-4': 235, u'indep-variable-5': 64.45, u'indep-variable-2': 92.22, u'indep-variable-3': 0.356, u'indep-variable-1': 24.32}]
[2017-07-19 08:16:39,154] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, grouped_features: [[23.45, 98.01, 0.432, 325, 54.64, 0.002, 25], [24.32, 92.22, 0.356, 235, 64.45, 0.001, 31]]
[2017-07-19 08:16:39,154] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, observation: {u'dependent-variable': u'dep-variable-3', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 24, u'indep-variable-4': 427, u'indep-variable-5': 75.45, u'indep-variable-2': 101.21, u'indep-variable-3': 0.832, u'indep-variable-1': 22.67}]}
[2017-07-19 08:16:39,155] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, indep_variables: [{u'indep-variable-6': 0.002, u'indep-variable-7': 24, u'indep-variable-4': 427, u'indep-variable-5': 75.45, u'indep-variable-2': 101.21, u'indep-variable-3': 0.832, u'indep-variable-1': 22.67}]
[2017-07-19 08:16:39,156] {/vagrant/log/logger.py:165} DEBUG - brain.session.model.sv: sv.py, grouped_features: [[23.45, 98.01, 0.432, 325, 54.64, 0.002, 25], [24.32, 92.22, 0.356, 235, 64.45, 0.001, 31], [22.67, 101.21, 0.832, 427, 75.45, 0.002, 24]]
[2017-07-19 08:16:39,164] {/vagrant/log/logger.py:165} DEBUG - brain.load_data: load_data.py, response: {'status': 0, 'msg': 'Model properly generated', 'type': 'model-generate'}
Note: the above snippet implemented the svm
model-type, during the data_new
session.
The following csv2dict.py
snippet:
...
logger = Logger(__name__, 'error', 'error')
# open temporary 'csvfile' reader object
dataset_reader = csv.reader(
raw_data,
delimiter=' ',
quotechar='|'
)
# first row of csvfile: get all columns, except first
for row in islice(dataset_reader, 0, 1):
indep_labels_list = row[0].split(',')[1:]
# all rows of csvfile: except first row
for dep_index, row in enumerate(islice(dataset_reader, 0, None)):
row_arr = row[0].split(',')
features_list = row_arr[1:]
features_dict = {k: v for k, v in zip(indep_labels_list, features_list)}
observation = {
'dependent-variable': row_arr[:1][0],
'independent-variables': [features_dict]
}
dataset.append(observation)
logger.log('/brain/converter/svm/csvtodict.py, dataset: ' + repr(dataset))
...
generates an error.log
, for the web-interface:
[2017-07-21 08:02:06,168] {/vagrant/log/logger.py:165} DEBUG - brain.converter.svm.csv2dict: /brain/converter/svm/csvtodict.py, dataset: [{'dependent-variable': 'dep-variable-1', 'independent-variables': [{'indep-variable-6': '0.002', 'indep-variable-7': '23', 'indep-variable-4': '325', 'indep-variable-5': '54.64', 'indep-variable-2': '98.01', 'indep-variable-3': '0.432', 'indep-variable-1': '23.45'}]}, {'dependent-variable': 'dep-variable-4', 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '32', 'indep-variable-4': '342', 'indep-variable-5': '66.67', 'indep-variable-2': '95.96', 'indep-variable-3': '0.743', 'indep-variable-1': '22.1'}]}, {'dependent-variable': 'dep-variable-5', 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '27', 'indep-variable-4': '295', 'indep-variable-5': '55.83', 'indep-variable-2': '95.03', 'indep-variable-3': '0.488', 'indep-variable-1': '23.27'}]}, {'dependent-variable': 'dep-variable-3', 'independent-variables': [{'indep-variable-6': '0.002', 'indep-variable-7': '26', 'indep-variable-4': '427', 'indep-variable-5': '75.45', 'indep-variable-2': '101.21', 'indep-variable-3': '0.832', 'indep-variable-1': '22.67'}]}, {'dependent-variable': 'dep-variable-5', 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '29', 'indep-variable-4': '303', 'indep-variable-5': '58.88', 'indep-variable-2': '97.78', 'indep-variable-3': '0.638', 'indep-variable-1': '19.99'}]}, {'dependent-variable': 'dep-variable-5', 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '27', 'indep-variable-4': '295', 'indep-variable-5': '55.83', 'indep-variable-2': '95.03', 'indep-variable-3': '0.488', 'indep-variable-1': '23.27'}]}, {'dependent-variable': 'dep-variable-4', 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '30', 'indep-variable-4': '342', 'indep-variable-5': '75.67', 'indep-variable-2': '99.33', 'indep-variable-3': '0.648', 'indep-variable-1': '20.71'}]}]
Note: the above snippet implemented the svm
model-type, during the data_new
session.
The following xml2dict.py
snippet:
# open temporary 'xmltodict' object
dataset = []
dataset_reader = xmltodict.parse(raw_data)
logger = Logger(__name__, 'error', 'error')
# build dataset
for observation in dataset_reader['dataset']['observation']:
features_dict = {}
dependent_variable = observation['dependent-variable']
for feature in observation['independent-variable']:
features_dict[feature['label']] = feature['value']
adjusted = {
'dependent-variable': dependent_variable,
'independent-variables': [features_dict]
}
dataset.append(adjusted)
logger.log('/brain/converter/format/xml2dict.py, dataset: ' + repr(dataset))
generates an error.log
, for the web-interface:
2017-07-22 15:58:34,624] {/vagrant/log/logger.py:165} DEBUG - brain.session.base_data: /brain/session/base_data.py, self.dataset: {'properties': {'stream': False, 'session_type': u'data_new', 'collection': u'collection-358', 'dataset_type': u'file_upload', 'model_type': u'svm', 'session_name': u'test'}, 'dataset': [{'dependent-variable': u'dep-variable-1', 'independent-variables': [{u'indep-variable-6': u'0.002', u'indep-variable-7': u'23', u'indep-variable-4': u'325', u'indep-variable-5': u'56.64', u'indep-variable-2': u'98.01', u'indep-variable-3': u'0.432', u'indep-variable-1': u'23.45'}]}, {'dependent-variable': u'dep-variable-4', 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'32', u'indep-variable-4': u'342', u'indep-variable-5': u'66.67', u'indep-variable-2': u'95.96', u'indep-variable-3': u'0.743', u'indep-variable-1': u'22.1'}]}, {'dependent-variable': u'dep-variable-5', 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'27', u'indep-variable-4': u'295', u'indep-variable-5': u'55.83', u'indep-variable-2': u'95.03', u'indep-variable-3': u'0.488', u'indep-variable-1': u'23.27'}]}, {'dependent-variable': u'dep-variable-3', 'independent-variables': [{u'indep-variable-6': u'0.002', u'indep-variable-7': u'26', u'indep-variable-4': u'427', u'indep-variable-5': u'75.45', u'indep-variable-2': u'101.21', u'indep-variable-3': u'0.832', u'indep-variable-1': u'22.67'}]}, {'dependent-variable': u'dep-variable-5', 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'29', u'indep-variable-4': u'303', u'indep-variable-5': u'58.88', u'indep-variable-2': u'97.78', u'indep-variable-3': u'0.638', u'indep-variable-1': u'19.99'}]}, {'dependent-variable': u'dep-variable-1', 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'27', u'indep-variable-4': u'295', u'indep-variable-5': u'55.83', u'indep-variable-2': u'95.03', u'indep-variable-3': u'0.488', u'indep-variable-1': u'23.27'}]}, {'dependent-variable': u'dep-variable-1', 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'30', u'indep-variable-4': u'342', u'indep-variable-5': u'75.67', u'indep-variable-2': u'99.33', u'indep-variable-3': u'0.648', u'indep-variable-1': u'20.71'}]}, {'dependent-variable': u'dep-variable-1', 'independent-variables': [{u'indep-variable-6': u'0.002', u'indep-variable-7': u'25', u'indep-variable-4': u'325', u'indep-variable-5': u'54.64', u'indep-variable-2': u'98.01', u'indep-variable-3': u'0.432', u'indep-variable-1': u'23.45'}]}, {'dependent-variable': u'dep-variable-2', 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'31', u'indep-variable-4': u'235', u'indep-variable-5': u'64.45', u'indep-variable-2': u'92.22', u'indep-variable-3': u'0.356', u'indep-variable-1': u'24.32'}]}, {'dependent-variable': u'dep-variable-3', 'independent-variables': [{u'indep-variable-6': u'0.002', u'indep-variable-7': u'24', u'indep-variable-4': u'427', u'indep-variable-5': u'75.45', u'indep-variable-2': u'101.21', u'indep-variable-3': u'0.832', u'indep-variable-1': u'22.67'}]}]}
Note: the above snippet implemented the svm
model-type, during the data_new
session.
Note: the above error.log
snippet was considerably longer, since both the svm.xml
, and svm-1.xml
was used during the data_new
session.
0c2d225: our previous commit 4cfa9f0 (from another computer), had wiped out from history, our commits for the last couple days, by forcing a merge from the master
, into feature-2844
. So, on the original machine (used this weekend), we were able to recover history, by trivially pushing the most previous commit (i.e. 98d98d8), prior to the accidental merge
, via git commit --amend -m "#2844: ..."
.
We have temporarily ensured the following snippet in our base_data.py
:
...
def save_premodel_dataset(self):
'''
This method saves the entire the dataset collection, as a json
document, into the nosql implementation.
'''
# save dataset
collection = self.premodel_data['properties']['collection']
collection_adjusted = collection.lower().replace(' ', '_')
cursor = Collection()
document = {'properties': self.premodel_data['properties'], 'dataset': self.dataset}
logger = Logger(__name__, 'error', 'error')
logger.log('/brain/session/base_data.py, self.dataset: ' + repr(document))
...
Upon a fresh data_new
session, with the following input datasets:
We noticed the following within our error.log
:
2017-07-24 18:21:30,397] {/vagrant/log/logger.py:165} DEBUG - brain.session.base_data: /brain/session/base_data.py, self.dataset: {'properties': {'stream': False, 'session_type': u'data_new', 'collection': u'collection-621', 'dataset_type': u'file_upload', 'model_type': u'svm', 'session_name': u'test'}, 'dataset': [{'dependent-variable': 'dep-variable-1', 'error': None, 'independent-variables': [{'indep-variable-6': '0.002', 'indep-variable-7': '23', 'indep-variable-4': '325', 'indep-variable-5': '54.64', 'indep-variable-2': '98.01', 'indep-variable-3': '0.432', 'indep-variable-1': '23.45'}]}, {'dependent-variable': 'dep-variable-4', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '32', 'indep-variable-4': '342', 'indep-variable-5': '66.67', 'indep-variable-2': '95.96', 'indep-variable-3': '0.743', 'indep-variable-1': '22.1'}]}, {'dependent-variable': 'dep-variable-5', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '27', 'indep-variable-4': '295', 'indep-variable-5': '55.83', 'indep-variable-2': '95.03', 'indep-variable-3': '0.488', 'indep-variable-1': '23.27'}]}, {'dependent-variable': 'dep-variable-3', 'error': None, 'independent-variables': [{'indep-variable-6': '0.002', 'indep-variable-7': '26', 'indep-variable-4': '427', 'indep-variable-5': '75.45', 'indep-variable-2': '101.21', 'indep-variable-3': '0.832', 'indep-variable-1': '22.67'}]}, {'dependent-variable': 'dep-variable-5', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '29', 'indep-variable-4': '303', 'indep-variable-5': '58.88', 'indep-variable-2': '97.78', 'indep-variable-3': '0.638', 'indep-variable-1': '19.99'}]}, {'dependent-variable': 'dep-variable-5', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '27', 'indep-variable-4': '295', 'indep-variable-5': '55.83', 'indep-variable-2': '95.03', 'indep-variable-3': '0.488', 'indep-variable-1': '23.27'}]}, {'dependent-variable': 'dep-variable-4', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '30', 'indep-variable-4': '342', 'indep-variable-5': '75.67', 'indep-variable-2': '99.33', 'indep-variable-3': '0.648', 'indep-variable-1': '20.71'}]}, {'dependent-variable': u'dep-variable-1', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.002', u'indep-variable-7': u'23', u'indep-variable-4': u'325', u'indep-variable-5': u'56.64', u'indep-variable-2': u'98.01', u'indep-variable-3': u'0.432', u'indep-variable-1': u'23.45'}]}, {'dependent-variable': u'dep-variable-4', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'32', u'indep-variable-4': u'342', u'indep-variable-5': u'66.67', u'indep-variable-2': u'95.96', u'indep-variable-3': u'0.743', u'indep-variable-1': u'22.1'}]}, {'dependent-variable': u'dep-variable-5', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'27', u'indep-variable-4': u'295', u'indep-variable-5': u'55.83', u'indep-variable-2': u'95.03', u'indep-variable-3': u'0.488', u'indep-variable-1': u'23.27'}]}, {'dependent-variable': u'dep-variable-3', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.002', u'indep-variable-7': u'26', u'indep-variable-4': u'427', u'indep-variable-5': u'75.45', u'indep-variable-2': u'101.21', u'indep-variable-3': u'0.832', u'indep-variable-1': u'22.67'}]}, {'dependent-variable': u'dep-variable-5', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'29', u'indep-variable-4': u'303', u'indep-variable-5': u'58.88', u'indep-variable-2': u'97.78', u'indep-variable-3': u'0.638', u'indep-variable-1': u'19.99'}]}, {'dependent-variable': u'dep-variable-1', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'27', u'indep-variable-4': u'295', u'indep-variable-5': u'55.83', u'indep-variable-2': u'95.03', u'indep-variable-3': u'0.488', u'indep-variable-1': u'23.27'}]}, {'dependent-variable': u'dep-variable-1', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'30', u'indep-variable-4': u'342', u'indep-variable-5': u'75.67', u'indep-variable-2': u'99.33', u'indep-variable-3': u'0.648', u'indep-variable-1': u'20.71'}]}, {u'dependent-variable': u'dep-variable-1', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 25, u'indep-variable-4': 325, u'indep-variable-5': 54.64, u'indep-variable-2': 98.01, u'indep-variable-3': 0.432, u'indep-variable-1': 23.45}]}, {u'dependent-variable': u'dep-variable-2', u'independent-variables': [{u'indep-variable-6': 0.001, u'indep-variable-7': 31, u'indep-variable-4': 235, u'indep-variable-5': 64.45, u'indep-variable-2': 92.22, u'indep-variable-3': 0.356, u'indep-variable-1': 24.32}]}, {u'dependent-variable': u'dep-variable-3', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 24, u'indep-variable-4': 427, u'indep-variable-5': 75.45, u'indep-variable-2': 101.21, u'indep-variable-3': 0.832, u'indep-variable-1': 22.67}]}, {u'dependent-variable': u'dep-variable-1', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 23, u'indep-variable-4': 325, u'indep-variable-5': 54.64, u'indep-variable-2': 98.01, u'indep-variable-3': 0.432, u'indep-variable-1': 23.45}]}, {u'dependent-variable': u'dep-variable-4', u'independent-variables': [{u'indep-variable-6': 0.001, u'indep-variable-7': 32, u'indep-variable-4': 342, u'indep-variable-5': 66.67, u'indep-variable-2': 95.96, u'indep-variable-3': 0.743, u'indep-variable-1': 22.1}, {u'indep-variable-6': 0.001, u'indep-variable-7': 30, u'indep-variable-4': 342, u'indep-variable-5': 75.67, u'indep-variable-2': 99.33, u'indep-variable-3': 0.648, u'indep-variable-1': 20.71}]}, {u'dependent-variable': u'dep-variable-5', u'independent-variables': [{u'indep-variable-6': 0.001, u'indep-variable-7': 27, u'indep-variable-4': 295, u'indep-variable-5': 55.83, u'indep-variable-2': 95.03, u'indep-variable-3': 0.488, u'indep-variable-1': 23.27}, {u'indep-variable-6': 0.001, u'indep-variable-7': 27, u'indep-variable-4': 295, u'indep-variable-5': 55.83, u'indep-variable-2': 95.03, u'indep-variable-3': 0.488, u'indep-variable-1': 23.27}, {u'indep-variable-6': 0.001, u'indep-variable-7': 29, u'indep-variable-4': 303, u'indep-variable-5': 58.88, u'indep-variable-2': 97.78, u'indep-variable-3': 0.638, u'indep-variable-1': 19.99}]}, {u'dependent-variable': u'dep-variable-3', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 26, u'indep-variable-4': 427, u'indep-variable-5': 75.45, u'indep-variable-2': 101.21, u'indep-variable-3': 0.832, u'indep-variable-1': 22.67}]}]}
Using the same temporary snippet in our base_data.py
, we were able to run a fresh data_append
session, with multiple input datasets:
We noticed the following within our error.log
:
[2017-07-24 18:41:16,875] {/vagrant/log/logger.py:165} DEBUG - brain.session.base_data: /brain/session/base_data.py, self.dataset: {'properties': {'model_type': u'svm', 'dataset_type': u'file_upload', 'collection': u'collection-621', 'stream': False, 'session_type': u'data_append'}, 'dataset': [{'dependent-variable': 'dep-variable-1', 'error': None, 'independent-variables': [{'indep-variable-6': '0.002', 'indep-variable-7': '25', 'indep-variable-4': '325', 'indep-variable-5': '54.64', 'indep-variable-2': '98.01', 'indep-variable-3': '0.432', 'indep-variable-1': '23.45'}]}, {'dependent-variable': 'dep-variable-2', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '31', 'indep-variable-4': '235', 'indep-variable-5': '64.45', 'indep-variable-2': '92.22', 'indep-variable-3': '0.356', 'indep-variable-1': '24.32'}]}, {'dependent-variable': 'dep-variable-3', 'error': None, 'independent-variables': [{'indep-variable-6': '0.002', 'indep-variable-7': '24', 'indep-variable-4': '427', 'indep-variable-5': '75.45', 'indep-variable-2': '101.21', 'indep-variable-3': '0.832', 'indep-variable-1': '22.67'}]}, {'dependent-variable': 'dep-variable-1', 'error': None, 'independent-variables': [{'indep-variable-6': '0.002', 'indep-variable-7': '23', 'indep-variable-4': '325', 'indep-variable-5': '54.64', 'indep-variable-2': '98.01', 'indep-variable-3': '0.432', 'indep-variable-1': '23.45'}]}, {'dependent-variable': 'dep-variable-4', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '32', 'indep-variable-4': '342', 'indep-variable-5': '66.67', 'indep-variable-2': '95.96', 'indep-variable-3': '0.743', 'indep-variable-1': '22.1'}]}, {'dependent-variable': 'dep-variable-5', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '27', 'indep-variable-4': '295', 'indep-variable-5': '55.83', 'indep-variable-2': '95.03', 'indep-variable-3': '0.488', 'indep-variable-1': '23.27'}]}, {'dependent-variable': 'dep-variable-3', 'error': None, 'independent-variables': [{'indep-variable-6': '0.002', 'indep-variable-7': '26', 'indep-variable-4': '427', 'indep-variable-5': '75.45', 'indep-variable-2': '101.21', 'indep-variable-3': '0.832', 'indep-variable-1': '22.67'}]}, {'dependent-variable': 'dep-variable-5', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '29', 'indep-variable-4': '303', 'indep-variable-5': '58.88', 'indep-variable-2': '97.78', 'indep-variable-3': '0.638', 'indep-variable-1': '19.99'}]}, {'dependent-variable': 'dep-variable-5', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '27', 'indep-variable-4': '295', 'indep-variable-5': '55.83', 'indep-variable-2': '95.03', 'indep-variable-3': '0.488', 'indep-variable-1': '23.27'}]}, {'dependent-variable': 'dep-variable-4', 'error': None, 'independent-variables': [{'indep-variable-6': '0.001', 'indep-variable-7': '30', 'indep-variable-4': '342', 'indep-variable-5': '75.67', 'indep-variable-2': '99.33', 'indep-variable-3': '0.648', 'indep-variable-1': '20.71'}]}, {u'dependent-variable': u'dep-variable-1', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 23, u'indep-variable-4': 325, u'indep-variable-5': 54.64, u'indep-variable-2': 98.01, u'indep-variable-3': 0.432, u'indep-variable-1': 23.45}]}, {u'dependent-variable': u'dep-variable-4', u'independent-variables': [{u'indep-variable-6': 0.001, u'indep-variable-7': 32, u'indep-variable-4': 342, u'indep-variable-5': 66.67, u'indep-variable-2': 95.96, u'indep-variable-3': 0.743, u'indep-variable-1': 22.1}, {u'indep-variable-6': 0.001, u'indep-variable-7': 30, u'indep-variable-4': 342, u'indep-variable-5': 75.67, u'indep-variable-2': 99.33, u'indep-variable-3': 0.648, u'indep-variable-1': 20.71}]}, {u'dependent-variable': u'dep-variable-5', u'independent-variables': [{u'indep-variable-6': 0.001, u'indep-variable-7': 27, u'indep-variable-4': 295, u'indep-variable-5': 55.83, u'indep-variable-2': 95.03, u'indep-variable-3': 0.488, u'indep-variable-1': 23.27}, {u'indep-variable-6': 0.001, u'indep-variable-7': 27, u'indep-variable-4': 295, u'indep-variable-5': 55.83, u'indep-variable-2': 95.03, u'indep-variable-3': 0.488, u'indep-variable-1': 23.27}, {u'indep-variable-6': 0.001, u'indep-variable-7': 29, u'indep-variable-4': 303, u'indep-variable-5': 58.88, u'indep-variable-2': 97.78, u'indep-variable-3': 0.638, u'indep-variable-1': 19.99}]}, {u'dependent-variable': u'dep-variable-3', u'independent-variables': [{u'indep-variable-6': 0.002, u'indep-variable-7': 26, u'indep-variable-4': 427, u'indep-variable-5': 75.45, u'indep-variable-2': 101.21, u'indep-variable-3': 0.832, u'indep-variable-1': 22.67}]}, {'dependent-variable': u'dep-variable-1', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.002', u'indep-variable-7': u'23', u'indep-variable-4': u'325', u'indep-variable-5': u'56.64', u'indep-variable-2': u'98.01', u'indep-variable-3': u'0.432', u'indep-variable-1': u'23.45'}]}, {'dependent-variable': u'dep-variable-4', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'32', u'indep-variable-4': u'342', u'indep-variable-5': u'66.67', u'indep-variable-2': u'95.96', u'indep-variable-3': u'0.743', u'indep-variable-1': u'22.1'}]}, {'dependent-variable': u'dep-variable-5', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'27', u'indep-variable-4': u'295', u'indep-variable-5': u'55.83', u'indep-variable-2': u'95.03', u'indep-variable-3': u'0.488', u'indep-variable-1': u'23.27'}]}, {'dependent-variable': u'dep-variable-3', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.002', u'indep-variable-7': u'26', u'indep-variable-4': u'427', u'indep-variable-5': u'75.45', u'indep-variable-2': u'101.21', u'indep-variable-3': u'0.832', u'indep-variable-1': u'22.67'}]}, {'dependent-variable': u'dep-variable-5', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'29', u'indep-variable-4': u'303', u'indep-variable-5': u'58.88', u'indep-variable-2': u'97.78', u'indep-variable-3': u'0.638', u'indep-variable-1': u'19.99'}]}, {'dependent-variable': u'dep-variable-1', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'27', u'indep-variable-4': u'295', u'indep-variable-5': u'55.83', u'indep-variable-2': u'95.03', u'indep-variable-3': u'0.488', u'indep-variable-1': u'23.27'}]}, {'dependent-variable': u'dep-variable-1', 'error': None, 'independent-variables': [{u'indep-variable-6': u'0.001', u'indep-variable-7': u'30', u'indep-variable-4': u'342', u'indep-variable-5': u'75.67', u'indep-variable-2': u'99.33', u'indep-variable-3': u'0.648', u'indep-variable-1': u'20.71'}]}]}
After a submitting a model_generate
session, our model_predict
session was able to provide an option, for the corresponding model. However, selecting the corresponding model for prediction, always resulted in the form not updating to the chosen model, for prediction:
Therefore, we'll need to investigate the following scenarios:
model_generate
session64c4559: in the future, we could ensure the operating user (whether logged-in, or anonymous), do not save a collection
by a name of an existing collection
, associated with their account. This will guarantee uniqueness, with respect to a corresponding model_predict.jsx
session:
{/* array components require unique 'key' value */}
{options && options.map(function(value) {
return <option key={value.collection} value={value.collection}>
{value.collection}
</option>;
})}
After #2842 is resolved, we need to determine the corresponding nosql data structure, and implement it respectively with our python backend logic.