mldbai / mldb

MLDB is the Machine Learning Database
http://mldb.ai
Apache License 2.0
661 stars 102 forks source link

Rare test failure in MLDB-2143-classifier-utf8.py (row not found in tabular dataset) #941

Closed jeremybarnes closed 2 years ago

jeremybarnes commented 2 years ago
nice make -j6 -k sanitizers=undefined UBSAN_OPTIONS=color=always:print_stacktrace=1:use_sigaltstack=false MLDB-2143-classifier-utf8.py
reading configuration from file: 'mldb/container_files/mldb.conf'

MLDB ready

creating SYMLINK /var/folders/sb/zv7k23t13130wff2fkt0pq9c0000gn/T/bgzDWg/main.py -> /Users/jeremy/projects/mldb/mldb/testing/MLDB-2143-classifier-utf8.py
loading from: /var/folders/sb/zv7k23t13130wff2fkt0pq9c0000gn/T/bgzDWg/main.py
ImportTextProcedure [2021-08-23T06:13:26.219-4:00] info reading 5 columns ["sepal length","sepal width","petal length","petal width","class"]
ImportTextProcedure [2021-08-23T06:13:26.219-4:00] info writing 5 columns ["sepal length","sepal width","petal length","petal width","class"]
ImportTextProcedure [2021-08-23T06:13:26.220-4:00] info imported 150 in 0.00107598s at 0.139407M lines/second on 1.61594 CPUs
ImportTextProcedure [2021-08-23T06:13:26.220-4:00] info done 0.00455 megabytes at 4.11739 megabytes/sec
ImportTextProcedure [2021-08-23T06:13:26.220-4:00] info processed 150 lines
commiting 1 frozen chunks
0 possible collisions
rowIndex.memusage() = 5,020
TabularDataset [2021-08-23T06:13:26.221-4:00] info row index took elapsed: [0.00s cpu, 0.4801 mticks, 0.00s wall, 2.44 cores]
TabularDataset [2021-08-23T06:13:26.221-4:00] info row name usage is 248 bytes at 1.65333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info timestamp usage is 449 bytes at 2.99333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column sepal length used 875 bytes at 5.83333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column sepal width used 735 bytes at 4.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column petal length used 939 bytes at 6.26 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column petal width used 734 bytes at 4.89333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column class used 531 bytes at 3.54 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T06:13:26.221-4:00] info total mem usage is 9627 bytes for 150 rows and 5 columns for 64.18 bytes/row
TabularDataset [2021-08-23T06:13:26.221-4:00] info column memory is 3814
commiting 0 frozen chunks
TabularDataset [2021-08-23T06:13:26.221-4:00] info row name usage is 248 bytes at 1.65333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info timestamp usage is 449 bytes at 2.99333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column sepal length used 875 bytes at 5.83333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column sepal width used 735 bytes at 4.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column petal length used 939 bytes at 6.26 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column petal width used 734 bytes at 4.89333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info column class used 531 bytes at 3.54 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.221-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T06:13:26.221-4:00] info total mem usage is 9627 bytes for 150 rows and 5 columns for 64.18 bytes/row
TabularDataset [2021-08-23T06:13:26.221-4:00] info column memory is 3814
commiting 3 frozen chunks
0 possible collisions
rowIndex.memusage() = 5,020
TabularDataset [2021-08-23T06:13:26.225-4:00] info row index took elapsed: [0.00s cpu, 0.6203 mticks, 0.00s wall, 2.30 cores]
TabularDataset [2021-08-23T06:13:26.225-4:00] info row name usage is 784 bytes at 5.22667 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T06:13:26.225-4:00] info timestamp usage is 1347 bytes at 8.98 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.225-4:00] info column label used 1497 bytes at 9.98 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.225-4:00] info column petal length used 1610 bytes at 10.7333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.225-4:00] info column petal width used 1485 bytes at 9.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.225-4:00] info column sepal length used 1558 bytes at 10.3867 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.225-4:00] info column sepal width used 1486 bytes at 9.90667 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.225-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T06:13:26.225-4:00] info total mem usage is 15075 bytes for 150 rows and 5 columns for 100.5 bytes/row
TabularDataset [2021-08-23T06:13:26.225-4:00] info column memory is 7636
commiting 4 frozen chunks
0 possible collisions
rowIndex.memusage() = 4,928
TabularDataset [2021-08-23T06:13:26.228-4:00] info row index took elapsed: [0.00s cpu, 0.4292 mticks, 0.00s wall, 3.16 cores]
TabularDataset [2021-08-23T06:13:26.228-4:00] info row name usage is 984 bytes at 6.56 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T06:13:26.228-4:00] info timestamp usage is 1796 bytes at 11.9733 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.228-4:00] info column label used 2080 bytes at 13.8667 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T06:13:26.228-4:00] info column petal length used 1936 bytes at 12.9067 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T06:13:26.228-4:00] info column petal width used 1936 bytes at 12.9067 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T06:13:26.228-4:00] info column sepal length used 1936 bytes at 12.9067 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T06:13:26.228-4:00] info column sepal width used 1936 bytes at 12.9067 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T06:13:26.228-4:00] info row index usage is 4928 bytes at 32.8533 per row
TabularDataset [2021-08-23T06:13:26.228-4:00] info total mem usage is 17916 bytes for 150 rows and 5 columns for 119.44 bytes/row
TabularDataset [2021-08-23T06:13:26.228-4:00] info column memory is 9824
ExperimentProcedure [2021-08-23T06:13:26.230-4:00] info  >>>>> Creating training procedure
ClassifierProcedure [2021-08-23T06:13:26.231-4:00] info initialized feature space in elapsed: [0.00s cpu, 0.4276 mticks, 0.00s wall, 1.22 cores]
ClassifierProcedure [2021-08-23T06:13:26.232-4:00] info extracted feature vectors in elapsed: [0.00s cpu, 3.3127 mticks, 0.00s wall, 1.38 cores]
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info merged feature vectors in elapsed: [0.00s cpu, 0.1042 mticks, 0.00s wall, 1.00 cores]
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info added feature vectors in elapsed: [0.00s cpu, 0.0824 mticks, 0.00s wall, 0.97 cores]
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info indexed training data in elapsed: [0.00s cpu, 0.0096 mticks, 0.00s wall, 0.67 cores]
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info Training with 6 features
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info equalization factor 1
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info weight for class 0 = 24
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info factor for class 0 = 0.0416667
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info weight for class 1 = 30
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info factor for class 1 = 0.0333333
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info weight for class 2 = 24
ClassifierProcedure [2021-08-23T06:13:26.233-4:00] info factor for class 2 = 0.0416667
Decision Tree: (weight = 78.00, cov = 100.00%)  0/0.333 1/0.333 2/0.333
 petal width >= 0.699999988079071 (z = 0.4444, weight = 54.00, cov = 69.23%)  1/0.500 2/0.500
     petal length >= 4.75 (z = 0.2344, weight = 28.00, cov = 35.90%)  1/0.118 2/0.882
         petal length >= 5.149999618530273 (z = 0.2442, weight = 17.00, cov = 21.79%)  2/1.000
         petal length  < 5.149999618530273 (z = 0.2442, weight = 11.00, cov = 14.10%)  1/0.314 2/0.686
             petal width >= 1.75 (z = 0.3107, weight = 8.00, cov = 10.26%)  1/0.103 2/0.897
                 petal length >= 4.850000381469727 (z = 0.1615, weight = 6.00, cov = 7.69%)  2/1.000
                 petal length  < 4.850000381469727 (z = 0.1615, weight = 2.00, cov = 2.56%)  1/0.444 2/0.556
                     sepal width >= 2.9499998092651367 (z = 0.0000, weight = 1.00, cov = 1.28%)  1/1.000
                     sepal width  < 2.9499998092651367 (z = 0.0000, weight = 1.00, cov = 1.28%)  2/1.000
             petal width  < 1.75 (z = 0.3107, weight = 3.00, cov = 3.85%)  1/1.000
     petal length  < 4.75 (z = 0.2344, weight = 26.00, cov = 33.33%)  1/1.000
 petal width  < 0.699999988079071 (z = 0.4444, weight = 24.00, cov = 30.77%)  0/1.000

ClassifierProcedure [2021-08-23T06:13:26.235-4:00] info trained classifier in elapsed: [0.00s cpu, 6.9065 mticks, 0.00s wall, 1.43 cores]
ClassifierProcedure [2021-08-23T06:13:26.236-4:00] info Saved classifier to file://tmp/iris_utf8.cls
ExperimentProcedure [2021-08-23T06:13:26.236-4:00] info  >>>>> Creating testing procedure
commiting 8 frozen chunks
0 possible collisions
rowIndex.memusage() = 4,784
TabularDataset [2021-08-23T06:13:26.240-4:00] info row index took elapsed: [0.00s cpu, 0.4985 mticks, 0.00s wall, 3.20 cores]
Direct memusage is 523 for 3 entries at 73 per entry
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T06:13:26.240-4:00] info row name usage is 1864 bytes at 25.8889 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T06:13:26.240-4:00] info timestamp usage is 2048 bytes at 28.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T06:13:26.240-4:00] info column score."""Iris-setosa""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T06:13:26.240-4:00] info column score."""Iris-versicolor""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T06:13:26.240-4:00] info column score."""Iris-virginica""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T06:13:26.240-4:00] info column maxLabel used 4016 bytes at 55.7778 per row with MLDB::DirectFrozenColumn
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T06:13:26.240-4:00] info column label used 4016 bytes at 55.7778 per row with MLDB::DirectFrozenColumn
TabularDataset [2021-08-23T06:13:26.240-4:00] info column weight used 1776 bytes at 24.6667 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T06:13:26.240-4:00] info row index usage is 4784 bytes at 66.4444 per row
TabularDataset [2021-08-23T06:13:26.240-4:00] info total mem usage is 24768 bytes for 72 rows and 6 columns for 344 bytes/row
TabularDataset [2021-08-23T06:13:26.240-4:00] info column memory is 15304
ExperimentProcedure [2021-08-23T06:13:26.240-4:00] info accuracy took elapsed: [0.01s cpu, 9.8827 mticks, 0.00s wall, 1.49 cores]
ExperimentProcedure [2021-08-23T06:13:26.242-4:00] info  >>>>> Creating training procedure
ClassifierProcedure [2021-08-23T06:13:26.243-4:00] info initialized feature space in elapsed: [0.00s cpu, 0.3651 mticks, 0.00s wall, 1.53 cores]
2021-08-23 06:13:26.250 stderr 
Stdout:
ds1 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa",1.4,0.2,5.1,3.5],["2","Iris-setosa",1.4,0.2,4.9,3],["3","Iris-setosa",1.3,0.2,4.7,3.2],["4","Iris-setosa",1.5,0.2,4.6,3.1],["5","Iris-setosa",1.4,0.2,5,3.6],["6","Iris-setosa",1.7,0.4,5.4,3.9],["7","Iris-setosa",1.4,0.3,4.6,3.4],["8","Iris-setosa",1.5,0.2,5,3.4],["9","Iris-setosa",1.4,0.2,4.4,2.9],["10","Iris-setosa",1.5,0.1,4.9,3.1]]
ds2 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa_éç",1.4,0.2,5.1,3.5],["5","Iris-setosa_éç",1.4,0.2,5,3.6],["6","Iris-setosa_éç",1.7,0.4,5.4,3.9],["7","Iris-setosa_éç",1.4,0.3,4.6,3.4],["9","Iris-setosa_éç",1.4,0.2,4.4,2.9],["10","Iris-setosa_éç",1.5,0.1,4.9,3.1],["17","Iris-setosa_éç",1.3,0.4,5.4,3.9],["21","Iris-setosa_éç",1.7,0.2,5.4,3.4],["26","Iris-setosa_éç",1.6,0.2,5,3],["30","Iris-setosa_éç",1.6,0.2,4.7,3.2]]

2021-08-23 06:13:26.250 script runner plugin test_utf8_category (__main__.MLDB2134classiferUtf8Test) ... ERROR

======================================================================
ERROR: test_utf8_category (__main__.MLDB2134classiferUtf8Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "file://mldb/testing/MLDB-2143-classifier-utf8.py", line 81, in test_utf8_category
  File "file://mldb/testing/MLDB-2143-classifier-utf8.py", line 55, in do_query
  File "/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py", line 169, in _post_put
    return self._perform(verb, url, [], data)
  File "/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py", line 125, in _perform
    raise mldb_wrapper.ResponseException(response)
mldb.mldb_wrapper.mldb_wrapper.ResponseException: Response status code: 400. Response text: {"details":{"entry":"cat_weights","runError":{"details":{"calc":["label","1.0"],"context":{"details":{"rowName":"4"},"error":"Row not found in tabular dataset: 4"},"from":{"columnCount":5,"rowCount":150},"limit":-1,"offset":0,"orderBy":"","select":"* EXCLUDING (label)","where":"rowHash() % 2 != 1"},"error":"Execution error: Row not found in tabular dataset: 4","httpCode":400}},"error":"failed to create the initial run","httpCode":400}

Stdout:
ds1 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa",1.4,0.2,5.1,3.5],["2","Iris-setosa",1.4,0.2,4.9,3],["3","Iris-setosa",1.3,0.2,4.7,3.2],["4","Iris-setosa",1.5,0.2,4.6,3.1],["5","Iris-setosa",1.4,0.2,5,3.6],["6","Iris-setosa",1.7,0.4,5.4,3.9],["7","Iris-setosa",1.4,0.3,4.6,3.4],["8","Iris-setosa",1.5,0.2,5,3.4],["9","Iris-setosa",1.4,0.2,4.4,2.9],["10","Iris-setosa",1.5,0.1,4.9,3.1]]
ds2 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa_éç",1.4,0.2,5.1,3.5],["5","Iris-setosa_éç",1.4,0.2,5,3.6],["6","Iris-setosa_éç",1.7,0.4,5.4,3.9],["7","Iris-setosa_éç",1.4,0.3,4.6,3.4],["9","Iris-setosa_éç",1.4,0.2,4.4,2.9],["10","Iris-setosa_éç",1.5,0.1,4.9,3.1],["17","Iris-setosa_éç",1.3,0.4,5.4,3.9],["21","Iris-setosa_éç",1.7,0.2,5.4,3.4],["26","Iris-setosa_éç",1.6,0.2,5,3],["30","Iris-setosa_éç",1.6,0.2,4.7,3.2]]

----------------------------------------------------------------------
Ran 1 test in 0.033s

FAILED (errors=1)

2021-08-23 06:13:26.251 loader 
{
    "context" : [ "Running python script" ],
    "lineNumber" : 204,
    "message" : "Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>",
    "scriptUri" : "/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py",
    "stack" : 
    [

        {
            "functionName" : "<module>",
            "lineNumber" : 98,
            "scriptUri" : "file://mldb/testing/MLDB-2143-classifier-utf8.py",
            "where" : "File \"file://mldb/testing/MLDB-2143-classifier-utf8.py\", line 98, in <module>"
        },

        {
            "functionName" : "run_tests",
            "lineNumber" : 204,
            "scriptUri" : "/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py",
            "where" : "File \"/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py\", line 204, in run_tests"
        }
    ],
    "type" : "mldb.mldb_wrapper.mldb_wrapper.TestSuiteFailureException",
    "where" : "File \"/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py\", line 204, in run_tests"
}

exception in accept: Operation canceled
exception in accept: Operation canceled
ServicePeer [2021-08-23T06:13:26.251-4:00] warning WARNING: peer mldb lost its own entry in discovery.  Letting it come back
peer mldb connection to mldb changed state to 3
peer mldb connection to mldb changed state to 3
jeremybarnes commented 2 years ago
reading configuration from file: 'mldb/container_files/mldb.conf'

MLDB ready

creating SYMLINK /var/folders/sb/zv7k23t13130wff2fkt0pq9c0000gn/T/8ea9OV/main.py -> /Users/jeremy/projects/mldb/mldb/testing/MLDB-2143-classifier-utf8.py
loading from: /var/folders/sb/zv7k23t13130wff2fkt0pq9c0000gn/T/8ea9OV/main.py
ImportTextProcedure [2021-08-23T17:40:40.162-4:00] info reading 5 columns ["sepal length","sepal width","petal length","petal width","class"]
ImportTextProcedure [2021-08-23T17:40:40.162-4:00] info writing 5 columns ["sepal length","sepal width","petal length","petal width","class"]
ImportTextProcedure [2021-08-23T17:40:40.163-4:00] info imported 150 in 0.00101018s at 0.148488M lines/second on 2.08323 CPUs
ImportTextProcedure [2021-08-23T17:40:40.163-4:00] info done 0.00455 megabytes at 4.17412 megabytes/sec
ImportTextProcedure [2021-08-23T17:40:40.163-4:00] info processed 150 lines
commiting 1 frozen chunks
0 possible collisions
rowIndex.memusage() = 5,020
TabularDataset [2021-08-23T17:40:40.163-4:00] info row index took elapsed: [0.00s cpu, 0.4631 mticks, 0.00s wall, 3.23 cores]
TabularDataset [2021-08-23T17:40:40.163-4:00] info row name usage is 248 bytes at 1.65333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T17:40:40.163-4:00] info timestamp usage is 449 bytes at 2.99333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.163-4:00] info column sepal length used 875 bytes at 5.83333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.163-4:00] info column sepal width used 735 bytes at 4.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info column petal length used 939 bytes at 6.26 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info column petal width used 734 bytes at 4.89333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info column class used 531 bytes at 3.54 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T17:40:40.164-4:00] info total mem usage is 9627 bytes for 150 rows and 5 columns for 64.18 bytes/row
TabularDataset [2021-08-23T17:40:40.164-4:00] info column memory is 3814
commiting 0 frozen chunks
TabularDataset [2021-08-23T17:40:40.164-4:00] info row name usage is 248 bytes at 1.65333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info timestamp usage is 449 bytes at 2.99333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info column sepal length used 875 bytes at 5.83333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info column sepal width used 735 bytes at 4.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info column petal length used 939 bytes at 6.26 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info column petal width used 734 bytes at 4.89333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info column class used 531 bytes at 3.54 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.164-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T17:40:40.164-4:00] info total mem usage is 9627 bytes for 150 rows and 5 columns for 64.18 bytes/row
TabularDataset [2021-08-23T17:40:40.164-4:00] info column memory is 3814
commiting 1 frozen chunks
0 possible collisions
rowIndex.memusage() = 5,020
TabularDataset [2021-08-23T17:40:40.167-4:00] info row index took elapsed: [0.00s cpu, 0.4280 mticks, 0.00s wall, 2.61 cores]
TabularDataset [2021-08-23T17:40:40.167-4:00] info row name usage is 248 bytes at 1.65333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T17:40:40.167-4:00] info timestamp usage is 449 bytes at 2.99333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.167-4:00] info column label used 531 bytes at 3.54 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.167-4:00] info column petal length used 939 bytes at 6.26 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.167-4:00] info column petal width used 734 bytes at 4.89333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.167-4:00] info column sepal length used 875 bytes at 5.83333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.167-4:00] info column sepal width used 735 bytes at 4.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.167-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T17:40:40.167-4:00] info total mem usage is 9627 bytes for 150 rows and 5 columns for 64.18 bytes/row
TabularDataset [2021-08-23T17:40:40.167-4:00] info column memory is 3814
commiting 8 frozen chunks
0 possible collisions
rowIndex.memusage() = 5,020
TabularDataset [2021-08-23T17:40:40.169-4:00] info row index took elapsed: [0.00s cpu, 0.4281 mticks, 0.00s wall, 2.67 cores]
TabularDataset [2021-08-23T17:40:40.169-4:00] info row name usage is 1912 bytes at 12.7467 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T17:40:40.169-4:00] info timestamp usage is 2345 bytes at 15.6333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.169-4:00] info column label used 4009 bytes at 26.7267 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.169-4:00] info column petal length used 2592 bytes at 17.28 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.169-4:00] info column petal width used 2520 bytes at 16.8 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.169-4:00] info column sepal length used 2672 bytes at 17.8133 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T17:40:40.169-4:00] info column sepal width used 2573 bytes at 17.1533 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T17:40:40.169-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T17:40:40.169-4:00] info total mem usage is 24411 bytes for 150 rows and 5 columns for 162.74 bytes/row
TabularDataset [2021-08-23T17:40:40.169-4:00] info column memory is 14366
ExperimentProcedure [2021-08-23T17:40:40.171-4:00] info  >>>>> Creating training procedure
ClassifierProcedure [2021-08-23T17:40:40.172-4:00] info initialized feature space in elapsed: [0.00s cpu, 0.2192 mticks, 0.00s wall, 1.16 cores]
ClassifierProcedure [2021-08-23T17:40:40.172-4:00] info extracted feature vectors in elapsed: [0.00s cpu, 1.9037 mticks, 0.00s wall, 1.28 cores]
ClassifierProcedure [2021-08-23T17:40:40.172-4:00] info merged feature vectors in elapsed: [0.00s cpu, 0.0222 mticks, 0.00s wall, 1.00 cores]
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info added feature vectors in elapsed: [0.00s cpu, 0.0668 mticks, 0.00s wall, 0.93 cores]
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info indexed training data in elapsed: [0.00s cpu, 0.0080 mticks, 0.00s wall, 0.80 cores]
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info Training with 6 features
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info equalization factor 1
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info weight for class 0 = 24
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info factor for class 0 = 0.0416667
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info weight for class 1 = 30
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info factor for class 1 = 0.0333333
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info weight for class 2 = 24
ClassifierProcedure [2021-08-23T17:40:40.173-4:00] info factor for class 2 = 0.0416667
Decision Tree: (weight = 78.00, cov = 100.00%)  0/0.333 1/0.333 2/0.333
 petal width >= 0.699999988079071 (z = 0.4444, weight = 54.00, cov = 69.23%)  1/0.500 2/0.500
     petal length >= 4.75 (z = 0.2344, weight = 28.00, cov = 35.90%)  1/0.118 2/0.882
         petal length >= 5.149999618530273 (z = 0.2442, weight = 17.00, cov = 21.79%)  2/1.000
         petal length  < 5.149999618530273 (z = 0.2442, weight = 11.00, cov = 14.10%)  1/0.314 2/0.686
             petal width >= 1.75 (z = 0.3107, weight = 8.00, cov = 10.26%)  1/0.103 2/0.897
                 petal length >= 4.850000381469727 (z = 0.1615, weight = 6.00, cov = 7.69%)  2/1.000
                 petal length  < 4.850000381469727 (z = 0.1615, weight = 2.00, cov = 2.56%)  1/0.444 2/0.556
                     sepal length >= 6.050000190734863 (z = 0.0000, weight = 1.00, cov = 1.28%)  2/1.000
                     sepal length  < 6.050000190734863 (z = 0.0000, weight = 1.00, cov = 1.28%)  1/1.000
             petal width  < 1.75 (z = 0.3107, weight = 3.00, cov = 3.85%)  1/1.000
     petal length  < 4.75 (z = 0.2344, weight = 26.00, cov = 33.33%)  1/1.000
 petal width  < 0.699999988079071 (z = 0.4444, weight = 24.00, cov = 30.77%)  0/1.000

ClassifierProcedure [2021-08-23T17:40:40.174-4:00] info trained classifier in elapsed: [0.00s cpu, 2.8278 mticks, 0.00s wall, 1.46 cores]
ClassifierProcedure [2021-08-23T17:40:40.174-4:00] info Saved classifier to file://tmp/iris_utf8.cls
ExperimentProcedure [2021-08-23T17:40:40.174-4:00] info  >>>>> Creating testing procedure
commiting 8 frozen chunks
0 possible collisions
rowIndex.memusage() = 4,784
TabularDataset [2021-08-23T17:40:40.177-4:00] info row index took elapsed: [0.00s cpu, 0.3427 mticks, 0.00s wall, 1.27 cores]
Direct memusage is 523 for 3 entries at 73 per entry
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T17:40:40.177-4:00] info row name usage is 1848 bytes at 25.6667 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T17:40:40.177-4:00] info timestamp usage is 2048 bytes at 28.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T17:40:40.177-4:00] info column score."""Iris-setosa""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T17:40:40.177-4:00] info column score."""Iris-versicolor""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T17:40:40.177-4:00] info column score."""Iris-virginica""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T17:40:40.177-4:00] info column maxLabel used 4016 bytes at 55.7778 per row with MLDB::DirectFrozenColumn
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T17:40:40.177-4:00] info column label used 4016 bytes at 55.7778 per row with MLDB::DirectFrozenColumn
TabularDataset [2021-08-23T17:40:40.177-4:00] info column weight used 1776 bytes at 24.6667 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T17:40:40.177-4:00] info row index usage is 4784 bytes at 66.4444 per row
TabularDataset [2021-08-23T17:40:40.177-4:00] info total mem usage is 24752 bytes for 72 rows and 6 columns for 343.778 bytes/row
TabularDataset [2021-08-23T17:40:40.177-4:00] info column memory is 15304
ExperimentProcedure [2021-08-23T17:40:40.177-4:00] info accuracy took elapsed: [0.00s cpu, 7.0587 mticks, 0.00s wall, 1.16 cores]
ExperimentProcedure [2021-08-23T17:40:40.179-4:00] info  >>>>> Creating training procedure
ClassifierProcedure [2021-08-23T17:40:40.179-4:00] info initialized feature space in elapsed: [0.00s cpu, 0.2391 mticks, 0.00s wall, 1.25 cores]
tryLookupRow 37: possible chunks { (0,4) (7,2) (1,8)  } of 3
  trying 0 4: 5 != 37
  trying 7 2: 88 != 37
  trying 1 8: 120 != 37
total rows 150
chunk 0 has 74 rows
*** found at index 36 in chunk 0
chunk 1 has 11 rows
chunk 2 has 8 rows
2021-08-23 17:40:40.182 stderr 
jeremybarnes commented 2 years ago
MLDB ready

creating SYMLINK /var/folders/sb/zv7k23t13130wff2fkt0pq9c0000gn/T/bh1F0y/main.py -> /Users/jeremy/projects/mldb/mldb/testing/MLDB-2143-classifier-utf8.py
loading from: /var/folders/sb/zv7k23t13130wff2fkt0pq9c0000gn/T/bh1F0y/main.py
ImportTextProcedure [2021-08-23T20:26:18.759-4:00] info reading 5 columns ["sepal length","sepal width","petal length","petal width","class"]
ImportTextProcedure [2021-08-23T20:26:18.759-4:00] info writing 5 columns ["sepal length","sepal width","petal length","petal width","class"]
ImportTextProcedure [2021-08-23T20:26:18.760-4:00] info imported 150 in 0.00120807s at 0.124165M lines/second on 1.71211 CPUs
ImportTextProcedure [2021-08-23T20:26:18.760-4:00] info done 0.00455 megabytes at 3.66931 megabytes/sec
ImportTextProcedure [2021-08-23T20:26:18.760-4:00] info processed 150 lines
commiting 1 frozen chunks
0 possible collisions
rowIndex.memusage() = 5,020
TabularDataset [2021-08-23T20:26:18.761-4:00] info row index took elapsed: [0.00s cpu, 0.5982 mticks, 0.00s wall, 2.02 cores]
TabularDataset [2021-08-23T20:26:18.761-4:00] info row name usage is 248 bytes at 1.65333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info timestamp usage is 449 bytes at 2.99333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column sepal length used 875 bytes at 5.83333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column sepal width used 735 bytes at 4.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column petal length used 939 bytes at 6.26 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column petal width used 734 bytes at 4.89333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column class used 531 bytes at 3.54 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T20:26:18.761-4:00] info total mem usage is 9627 bytes for 150 rows and 5 columns for 64.18 bytes/row
TabularDataset [2021-08-23T20:26:18.761-4:00] info column memory is 3814
commiting 0 frozen chunks
TabularDataset [2021-08-23T20:26:18.761-4:00] info row name usage is 248 bytes at 1.65333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info timestamp usage is 449 bytes at 2.99333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column sepal length used 875 bytes at 5.83333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column sepal width used 735 bytes at 4.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column petal length used 939 bytes at 6.26 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column petal width used 734 bytes at 4.89333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info column class used 531 bytes at 3.54 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.761-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T20:26:18.761-4:00] info total mem usage is 9627 bytes for 150 rows and 5 columns for 64.18 bytes/row
TabularDataset [2021-08-23T20:26:18.761-4:00] info column memory is 3814
commiting 8 frozen chunks
0 possible collisions
rowIndex.memusage() = 5,096
TabularDataset [2021-08-23T20:26:18.767-4:00] info row index took elapsed: [0.00s cpu, 0.9311 mticks, 0.00s wall, 2.49 cores]
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
TabularDataset [2021-08-23T20:26:18.767-4:00] info row name usage is 1712 bytes at 11.4133 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T20:26:18.767-4:00] info timestamp usage is 1873 bytes at 12.4867 per row with MLDB::TableFrozenColumn
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
TabularDataset [2021-08-23T20:26:18.767-4:00] info column label used 3844 bytes at 25.6267 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.767-4:00] info column petal length used 2320 bytes at 15.4667 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.767-4:00] info column petal width used 2150 bytes at 14.3333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.767-4:00] info column sepal length used 2283 bytes at 15.22 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.767-4:00] info column sepal width used 2151 bytes at 14.34 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.767-4:00] info row index usage is 5096 bytes at 33.9733 per row
TabularDataset [2021-08-23T20:26:18.767-4:00] info total mem usage is 22197 bytes for 150 rows and 5 columns for 147.98 bytes/row
TabularDataset [2021-08-23T20:26:18.767-4:00] info column memory is 12748
commiting 8 frozen chunks
0 possible collisions
rowIndex.memusage() = 5,084
TabularDataset [2021-08-23T20:26:18.771-4:00] info row index took elapsed: [0.00s cpu, 0.9354 mticks, 0.00s wall, 3.23 cores]
TabularDataset [2021-08-23T20:26:18.771-4:00] info row name usage is 1784 bytes at 11.8933 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T20:26:18.771-4:00] info timestamp usage is 1929 bytes at 12.86 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.771-4:00] info column label used 3758 bytes at 25.0533 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.771-4:00] info column petal length used 2359 bytes at 15.7267 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.771-4:00] info column petal width used 2198 bytes at 14.6533 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.771-4:00] info column sepal length used 2280 bytes at 15.2 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.771-4:00] info column sepal width used 2199 bytes at 14.66 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T20:26:18.771-4:00] info row index usage is 5084 bytes at 33.8933 per row
TabularDataset [2021-08-23T20:26:18.771-4:00] info total mem usage is 22359 bytes for 150 rows and 5 columns for 149.06 bytes/row
TabularDataset [2021-08-23T20:26:18.771-4:00] info column memory is 12794
ExperimentProcedure [2021-08-23T20:26:18.774-4:00] info  >>>>> Creating training procedure
ClassifierProcedure [2021-08-23T20:26:18.775-4:00] info initialized feature space in elapsed: [0.00s cpu, 0.3638 mticks, 0.00s wall, 1.18 cores]
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info extracted feature vectors in elapsed: [0.00s cpu, 2.7076 mticks, 0.00s wall, 1.16 cores]
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info merged feature vectors in elapsed: [0.00s cpu, 0.0353 mticks, 0.00s wall, 0.93 cores]
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info added feature vectors in elapsed: [0.00s cpu, 0.0935 mticks, 0.00s wall, 0.97 cores]
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info indexed training data in elapsed: [0.00s cpu, 0.0122 mticks, 0.00s wall, 0.91 cores]
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info Training with 6 features
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info equalization factor 1
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info weight for class 0 = 24
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info factor for class 0 = 0.0416667
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info weight for class 1 = 30
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info factor for class 1 = 0.0333333
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info weight for class 2 = 24
ClassifierProcedure [2021-08-23T20:26:18.776-4:00] info factor for class 2 = 0.0416667
Decision Tree: (weight = 78.00, cov = 100.00%)  0/0.333 1/0.333 2/0.333
 petal width >= 0.699999988079071 (z = 0.4444, weight = 54.00, cov = 69.23%)  1/0.500 2/0.500
     petal length >= 4.75 (z = 0.2344, weight = 28.00, cov = 35.90%)  1/0.118 2/0.882
         petal length >= 5.149999618530273 (z = 0.2442, weight = 17.00, cov = 21.79%)  2/1.000
         petal length  < 5.149999618530273 (z = 0.2442, weight = 11.00, cov = 14.10%)  1/0.314 2/0.686
             petal width >= 1.75 (z = 0.3107, weight = 8.00, cov = 10.26%)  1/0.103 2/0.897
                 petal length >= 4.850000381469727 (z = 0.1615, weight = 6.00, cov = 7.69%)  2/1.000
                 petal length  < 4.850000381469727 (z = 0.1615, weight = 2.00, cov = 2.56%)  1/0.444 2/0.556
                     sepal length >= 6.050000190734863 (z = 0.0000, weight = 1.00, cov = 1.28%)  2/1.000
                     sepal length  < 6.050000190734863 (z = 0.0000, weight = 1.00, cov = 1.28%)  1/1.000
             petal width  < 1.75 (z = 0.3107, weight = 3.00, cov = 3.85%)  1/1.000
     petal length  < 4.75 (z = 0.2344, weight = 26.00, cov = 33.33%)  1/1.000
 petal width  < 0.699999988079071 (z = 0.4444, weight = 24.00, cov = 30.77%)  0/1.000

ClassifierProcedure [2021-08-23T20:26:18.778-4:00] info trained classifier in elapsed: [0.00s cpu, 4.1913 mticks, 0.00s wall, 1.11 cores]
ClassifierProcedure [2021-08-23T20:26:18.778-4:00] info Saved classifier to file://tmp/iris_utf8.cls
ExperimentProcedure [2021-08-23T20:26:18.778-4:00] info  >>>>> Creating testing procedure
commiting 8 frozen chunks
0 possible collisions
rowIndex.memusage() = 4,784
TabularDataset [2021-08-23T20:26:18.782-4:00] info row index took elapsed: [0.00s cpu, 0.6681 mticks, 0.00s wall, 1.92 cores]
Direct memusage is 523 for 3 entries at 73 per entry
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T20:26:18.782-4:00] info row name usage is 1856 bytes at 25.7778 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.782-4:00] info timestamp usage is 2048 bytes at 28.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.783-4:00] info column score."""Iris-setosa""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.783-4:00] info column score."""Iris-versicolor""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.783-4:00] info column score."""Iris-virginica""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T20:26:18.783-4:00] info column maxLabel used 4016 bytes at 55.7778 per row with MLDB::DirectFrozenColumn
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T20:26:18.783-4:00] info column label used 4016 bytes at 55.7778 per row with MLDB::DirectFrozenColumn
TabularDataset [2021-08-23T20:26:18.783-4:00] info column weight used 1776 bytes at 24.6667 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.783-4:00] info row index usage is 4784 bytes at 66.4444 per row
TabularDataset [2021-08-23T20:26:18.783-4:00] info total mem usage is 24760 bytes for 72 rows and 6 columns for 343.889 bytes/row
TabularDataset [2021-08-23T20:26:18.783-4:00] info column memory is 15304
ExperimentProcedure [2021-08-23T20:26:18.783-4:00] info accuracy took elapsed: [0.01s cpu, 10.8150 mticks, 0.00s wall, 1.62 cores]
ExperimentProcedure [2021-08-23T20:26:18.785-4:00] info  >>>>> Creating training procedure
ClassifierProcedure [2021-08-23T20:26:18.786-4:00] info initialized feature space in elapsed: [0.00s cpu, 0.3453 mticks, 0.00s wall, 1.37 cores]
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info extracted feature vectors in elapsed: [0.00s cpu, 2.9396 mticks, 0.00s wall, 1.39 cores]
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info merged feature vectors in elapsed: [0.00s cpu, 0.0750 mticks, 0.00s wall, 0.94 cores]
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info added feature vectors in elapsed: [0.00s cpu, 0.0556 mticks, 0.00s wall, 3.75 cores]
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info indexed training data in elapsed: [0.00s cpu, 0.0040 mticks, 0.00s wall, 2.80 cores]
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info Training with 6 features
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info equalization factor 1
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info weight for class 0 = 24
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info factor for class 0 = 0.0416667
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info weight for class 1 = 30
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info factor for class 1 = 0.0333333
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info weight for class 2 = 24
ClassifierProcedure [2021-08-23T20:26:18.787-4:00] info factor for class 2 = 0.0416667
Decision Tree: (weight = 78.00, cov = 100.00%)  0/0.333 1/0.333 2/0.333
 petal width >= 0.699999988079071 (z = 0.4444, weight = 54.00, cov = 69.23%)  1/0.500 2/0.500
     petal length >= 4.75 (z = 0.2344, weight = 28.00, cov = 35.90%)  1/0.118 2/0.882
         petal length >= 5.149999618530273 (z = 0.2442, weight = 17.00, cov = 21.79%)  2/1.000
         petal length  < 5.149999618530273 (z = 0.2442, weight = 11.00, cov = 14.10%)  1/0.314 2/0.686
             petal width >= 1.75 (z = 0.3107, weight = 8.00, cov = 10.26%)  1/0.103 2/0.897
                 sepal width >= 3.1500000953674316 (z = 0.1615, weight = 2.00, cov = 2.56%)  1/0.444 2/0.556
                     petal width >= 1.8499999046325684 (z = 0.0000, weight = 1.00, cov = 1.28%)  2/1.000
                     petal width  < 1.8499999046325684 (z = 0.0000, weight = 1.00, cov = 1.28%)  1/1.000
                 sepal width  < 3.1500000953674316 (z = 0.1615, weight = 6.00, cov = 7.69%)  2/1.000
             petal width  < 1.75 (z = 0.3107, weight = 3.00, cov = 3.85%)  1/1.000
     petal length  < 4.75 (z = 0.2344, weight = 26.00, cov = 33.33%)  1/1.000
 petal width  < 0.699999988079071 (z = 0.4444, weight = 24.00, cov = 30.77%)  0/1.000

ClassifierProcedure [2021-08-23T20:26:18.789-4:00] info trained classifier in elapsed: [0.00s cpu, 3.8277 mticks, 0.00s wall, 1.35 cores]
ClassifierProcedure [2021-08-23T20:26:18.789-4:00] info Saved classifier to file://tmp/iris_utf8.cls
ExperimentProcedure [2021-08-23T20:26:18.790-4:00] info  >>>>> Creating testing procedure
commiting 8 frozen chunks
0 possible collisions
rowIndex.memusage() = 4,784
TabularDataset [2021-08-23T20:26:18.794-4:00] info row index took elapsed: [0.00s cpu, 0.5701 mticks, 0.00s wall, 2.66 cores]
Direct memusage is 538 for 3 entries at 78 per entry
Direct memusage is 538 for 3 entries at 78 per entry
TabularDataset [2021-08-23T20:26:18.794-4:00] info row name usage is 1864 bytes at 25.8889 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.794-4:00] info timestamp usage is 2048 bytes at 28.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.794-4:00] info column score."""Iris-setosa_éç""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.794-4:00] info column score."""Iris-versicolor_éç""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.794-4:00] info column score."""Iris-virginica_éç""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
Direct memusage is 538 for 3 entries at 78 per entry
TabularDataset [2021-08-23T20:26:18.794-4:00] info column maxLabel used 4136 bytes at 57.4444 per row with MLDB::DirectFrozenColumn
Direct memusage is 538 for 3 entries at 78 per entry
TabularDataset [2021-08-23T20:26:18.794-4:00] info column label used 4136 bytes at 57.4444 per row with MLDB::DirectFrozenColumn
TabularDataset [2021-08-23T20:26:18.794-4:00] info column weight used 1776 bytes at 24.6667 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T20:26:18.794-4:00] info row index usage is 4784 bytes at 66.4444 per row
TabularDataset [2021-08-23T20:26:18.794-4:00] info total mem usage is 25008 bytes for 72 rows and 6 columns for 347.333 bytes/row
TabularDataset [2021-08-23T20:26:18.794-4:00] info column memory is 15544
ExperimentProcedure [2021-08-23T20:26:18.794-4:00] info accuracy took elapsed: [0.01s cpu, 10.5084 mticks, 0.00s wall, 1.53 cores]
2021-08-23 20:26:18.797 stderr 
Stdout:
ds1 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa",1.4,0.2,5.1,3.5],["2","Iris-setosa",1.4,0.2,4.9,3],["3","Iris-setosa",1.3,0.2,4.7,3.2],["4","Iris-setosa",1.5,0.2,4.6,3.1],["5","Iris-setosa",1.4,0.2,5,3.6],["6","Iris-setosa",1.7,0.4,5.4,3.9],["7","Iris-setosa",1.4,0.3,4.6,3.4],["8","Iris-setosa",1.5,0.2,5,3.4],["9","Iris-setosa",1.4,0.2,4.4,2.9],["10","Iris-setosa",1.5,0.1,4.9,3.1]]
ds2 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa_éç",1.4,0.2,5.1,3.5],["2","Iris-setosa_éç",1.4,0.2,4.9,3],["3","Iris-setosa_éç",1.3,0.2,4.7,3.2],["4","Iris-setosa_éç",1.5,0.2,4.6,3.1],["5","Iris-setosa_éç",1.4,0.2,5,3.6],["6","Iris-setosa_éç",1.7,0.4,5.4,3.9],["7","Iris-setosa_éç",1.4,0.3,4.6,3.4],["8","Iris-setosa_éç",1.5,0.2,5,3.4],["9","Iris-setosa_éç",1.4,0.2,4.4,2.9],["10","Iris-setosa_éç",1.5,0.1,4.9,3.1]]

2021-08-23 20:26:18.797 script runner plugin test_utf8_category (__main__.MLDB2134classiferUtf8Test) ... FAIL

======================================================================
FAIL: test_utf8_category (__main__.MLDB2134classiferUtf8Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "file://mldb/testing/MLDB-2143-classifier-utf8.py", line 86, in test_utf8_category
AssertionError: {'accuracy': 0.9645061728395061, 'f1Score': 0.94465488215[81 chars]72.0} != {'accuracy': 0.9733796296296298, 'f1Score': 0.95851131150[80 chars]72.0}
- {'accuracy': 0.9645061728395061,
-  'f1Score': 0.9446548821548821,
-  'precision': 0.9537037037037037,
-  'recall': 0.9444444444444444,
+ {'accuracy': 0.9733796296296298,
+  'f1Score': 0.9585113115013447,
+  'precision': 0.963768115942029,
+  'recall': 0.9583333333333334,
   'support': 72.0}

Stdout:
ds1 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa",1.4,0.2,5.1,3.5],["2","Iris-setosa",1.4,0.2,4.9,3],["3","Iris-setosa",1.3,0.2,4.7,3.2],["4","Iris-setosa",1.5,0.2,4.6,3.1],["5","Iris-setosa",1.4,0.2,5,3.6],["6","Iris-setosa",1.7,0.4,5.4,3.9],["7","Iris-setosa",1.4,0.3,4.6,3.4],["8","Iris-setosa",1.5,0.2,5,3.4],["9","Iris-setosa",1.4,0.2,4.4,2.9],["10","Iris-setosa",1.5,0.1,4.9,3.1]]
ds2 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa_éç",1.4,0.2,5.1,3.5],["2","Iris-setosa_éç",1.4,0.2,4.9,3],["3","Iris-setosa_éç",1.3,0.2,4.7,3.2],["4","Iris-setosa_éç",1.5,0.2,4.6,3.1],["5","Iris-setosa_éç",1.4,0.2,5,3.6],["6","Iris-setosa_éç",1.7,0.4,5.4,3.9],["7","Iris-setosa_éç",1.4,0.3,4.6,3.4],["8","Iris-setosa_éç",1.5,0.2,5,3.4],["9","Iris-setosa_éç",1.4,0.2,4.4,2.9],["10","Iris-setosa_éç",1.5,0.1,4.9,3.1]]

----------------------------------------------------------------------
Ran 1 test in 0.040s

FAILED (failures=1)

2021-08-23 20:26:18.797 loader 
{
    "context" : [ "Running python script" ],
    "lineNumber" : 204,
    "message" : "Test failed: <unittest.runner.TextTestResult run=1 errors=0 failures=1>",
    "scriptUri" : "/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py",
    "stack" : 
    [

        {
            "functionName" : "<module>",
            "lineNumber" : 98,
            "scriptUri" : "file://mldb/testing/MLDB-2143-classifier-utf8.py",
            "where" : "File \"file://mldb/testing/MLDB-2143-classifier-utf8.py\", line 98, in <module>"
        },

        {
            "functionName" : "run_tests",
            "lineNumber" : 204,
            "scriptUri" : "/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py",
            "where" : "File \"/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py\", line 204, in run_tests"
        }
    ],
    "type" : "mldb.mldb_wrapper.mldb_wrapper.TestSuiteFailureException",
    "where" : "File \"/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py\", line 204, in run_tests"
}

exception in accept: Operation canceled
exception in accept: Operation canceled
ServicePeer [2021-08-23T20:26:18.798-4:00] warning WARNING: peer mldb lost its own entry in discovery.  Letting it come back
peer mldb connection to mldb changed state to 3
peer mldb connection to mldb changed state to 3
jeremybarnes commented 2 years ago
reading configuration from file: 'mldb/container_files/mldb.conf'

MLDB ready

creating SYMLINK /var/folders/sb/zv7k23t13130wff2fkt0pq9c0000gn/T/XdXBHJ/main.py -> /Users/jeremy/projects/mldb/mldb/testing/MLDB-2143-classifier-utf8.py
loading from: /var/folders/sb/zv7k23t13130wff2fkt0pq9c0000gn/T/XdXBHJ/main.py
ImportTextProcedure [2021-08-23T22:31:40.440-4:00] info reading 5 columns ["sepal length","sepal width","petal length","petal width","class"]
ImportTextProcedure [2021-08-23T22:31:40.440-4:00] info writing 5 columns ["sepal length","sepal width","petal length","petal width","class"]
ImportTextProcedure [2021-08-23T22:31:40.442-4:00] info imported 150 in 0.0016911s at 0.0886995M lines/second on 1.31419 CPUs
ImportTextProcedure [2021-08-23T22:31:40.442-4:00] info done 0.00455 megabytes at 2.63301 megabytes/sec
ImportTextProcedure [2021-08-23T22:31:40.442-4:00] info processed 150 lines
commiting 1 frozen chunks with 0 rows
freeze: maxChunkNumber 0 maxChunkIndex 149
0 possible collisions
rowIndex.memusage() = 5,020
TabularDataset [2021-08-23T22:31:40.443-4:00] info row index took elapsed: [0.00s cpu, 0.6081 mticks, 0.00s wall, 2.07 cores]
TabularDataset [2021-08-23T22:31:40.443-4:00] info row name usage is 248 bytes at 1.65333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info timestamp usage is 449 bytes at 2.99333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column sepal length used 875 bytes at 5.83333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column sepal width used 735 bytes at 4.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column petal length used 939 bytes at 6.26 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column petal width used 734 bytes at 4.89333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column class used 531 bytes at 3.54 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T22:31:40.443-4:00] info total mem usage is 9627 bytes for 150 rows and 5 columns for 64.18 bytes/row
TabularDataset [2021-08-23T22:31:40.443-4:00] info column memory is 3814
commiting 0 frozen chunks with 150 rows
TabularDataset [2021-08-23T22:31:40.443-4:00] info row name usage is 248 bytes at 1.65333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info timestamp usage is 449 bytes at 2.99333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column sepal length used 875 bytes at 5.83333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column sepal width used 735 bytes at 4.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column petal length used 939 bytes at 6.26 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column petal width used 734 bytes at 4.89333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info column class used 531 bytes at 3.54 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.443-4:00] info row index usage is 5020 bytes at 33.4667 per row
TabularDataset [2021-08-23T22:31:40.443-4:00] info total mem usage is 9627 bytes for 150 rows and 5 columns for 64.18 bytes/row
TabularDataset [2021-08-23T22:31:40.443-4:00] info column memory is 3814
commiting 11 frozen chunks with 0 rows
freeze: maxChunkNumber 10 maxChunkIndex 136
0 possible collisions
rowIndex.memusage() = 5,096
TabularDataset [2021-08-23T22:31:40.448-4:00] info row index took elapsed: [0.00s cpu, 0.5525 mticks, 0.00s wall, 2.40 cores]
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
TabularDataset [2021-08-23T22:31:40.448-4:00] info row name usage is 2240 bytes at 14.9333 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T22:31:40.448-4:00] info timestamp usage is 2393 bytes at 15.9533 per row with MLDB::TableFrozenColumn
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
Direct memusage is 487 for 1 entries at 183 per entry
TabularDataset [2021-08-23T22:31:40.448-4:00] info column label used 5337 bytes at 35.58 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.448-4:00] info column petal length used 2858 bytes at 19.0533 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.448-4:00] info column petal width used 2670 bytes at 17.8 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.448-4:00] info column sepal length used 2803 bytes at 18.6867 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.448-4:00] info column sepal width used 2671 bytes at 17.8067 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.448-4:00] info row index usage is 5096 bytes at 33.9733 per row
TabularDataset [2021-08-23T22:31:40.448-4:00] info total mem usage is 27124 bytes for 150 rows and 5 columns for 180.827 bytes/row
TabularDataset [2021-08-23T22:31:40.448-4:00] info column memory is 16339
commiting 9 frozen chunks with 0 rows
freeze: maxChunkNumber 8 maxChunkIndex 126
0 possible collisions
rowIndex.memusage() = 5,084
TabularDataset [2021-08-23T22:31:40.452-4:00] info row index took elapsed: [0.00s cpu, 0.6047 mticks, 0.00s wall, 1.84 cores]
TabularDataset [2021-08-23T22:31:40.452-4:00] info row name usage is 1960 bytes at 13.0667 per row with MLDB::IntegerFrozenColumn
TabularDataset [2021-08-23T22:31:40.452-4:00] info timestamp usage is 2105 bytes at 14.0333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.452-4:00] info column label used 4218 bytes at 28.12 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.452-4:00] info column petal length used 2535 bytes at 16.9 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.452-4:00] info column petal width used 2374 bytes at 15.8267 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.452-4:00] info column sepal length used 2498 bytes at 16.6533 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.452-4:00] info column sepal width used 2375 bytes at 15.8333 per row with MLDB::TableFrozenColumn
TabularDataset [2021-08-23T22:31:40.452-4:00] info row index usage is 5084 bytes at 33.8933 per row
TabularDataset [2021-08-23T22:31:40.452-4:00] info total mem usage is 24013 bytes for 150 rows and 5 columns for 160.087 bytes/row
TabularDataset [2021-08-23T22:31:40.452-4:00] info column memory is 14000
ExperimentProcedure [2021-08-23T22:31:40.455-4:00] info  >>>>> Creating training procedure
ClassifierProcedure [2021-08-23T22:31:40.456-4:00] info initialized feature space in elapsed: [0.00s cpu, 0.6078 mticks, 0.00s wall, 1.20 cores]
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info extracted feature vectors in elapsed: [0.00s cpu, 2.9405 mticks, 0.00s wall, 1.69 cores]
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info merged feature vectors in elapsed: [0.00s cpu, 0.1080 mticks, 0.00s wall, 0.96 cores]
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info added feature vectors in elapsed: [0.00s cpu, 0.1202 mticks, 0.00s wall, 0.96 cores]
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info indexed training data in elapsed: [0.00s cpu, 0.0132 mticks, 0.00s wall, 0.84 cores]
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info Training with 6 features
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info equalization factor 1
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info weight for class 0 = 24
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info factor for class 0 = 0.0416667
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info weight for class 1 = 30
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info factor for class 1 = 0.0333333
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info weight for class 2 = 24
ClassifierProcedure [2021-08-23T22:31:40.458-4:00] info factor for class 2 = 0.0416667
Decision Tree: (weight = 78.00, cov = 100.00%)  0/0.333 1/0.333 2/0.333
 petal width >= 0.699999988079071 (z = 0.4444, weight = 54.00, cov = 69.23%)  1/0.500 2/0.500
     petal length >= 4.75 (z = 0.2344, weight = 28.00, cov = 35.90%)  1/0.118 2/0.882
         petal length >= 5.149999618530273 (z = 0.2442, weight = 17.00, cov = 21.79%)  2/1.000
         petal length  < 5.149999618530273 (z = 0.2442, weight = 11.00, cov = 14.10%)  1/0.314 2/0.686
             petal width >= 1.75 (z = 0.3107, weight = 8.00, cov = 10.26%)  1/0.103 2/0.897
                 sepal width >= 3.1500000953674316 (z = 0.1615, weight = 2.00, cov = 2.56%)  1/0.444 2/0.556
                     petal width >= 1.8499999046325684 (z = 0.0000, weight = 1.00, cov = 1.28%)  2/1.000
                     petal width  < 1.8499999046325684 (z = 0.0000, weight = 1.00, cov = 1.28%)  1/1.000
                 sepal width  < 3.1500000953674316 (z = 0.1615, weight = 6.00, cov = 7.69%)  2/1.000
             petal width  < 1.75 (z = 0.3107, weight = 3.00, cov = 3.85%)  1/1.000
     petal length  < 4.75 (z = 0.2344, weight = 26.00, cov = 33.33%)  1/1.000
 petal width  < 0.699999988079071 (z = 0.4444, weight = 24.00, cov = 30.77%)  0/1.000

ClassifierProcedure [2021-08-23T22:31:40.459-4:00] info trained classifier in elapsed: [0.00s cpu, 3.7302 mticks, 0.00s wall, 1.64 cores]
ClassifierProcedure [2021-08-23T22:31:40.460-4:00] info Saved classifier to file://tmp/iris_utf8.cls
ExperimentProcedure [2021-08-23T22:31:40.460-4:00] info  >>>>> Creating testing procedure
commiting 8 frozen chunks with 0 rows
freeze: maxChunkNumber 7 maxChunkIndex 12
0 possible collisions
rowIndex.memusage() = 4,784
TabularDataset [2021-08-23T22:31:40.464-4:00] info row index took elapsed: [0.00s cpu, 0.4362 mticks, 0.00s wall, 1.87 cores]
Direct memusage is 523 for 3 entries at 73 per entry
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T22:31:40.464-4:00] info row name usage is 1856 bytes at 25.7778 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.464-4:00] info timestamp usage is 2048 bytes at 28.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.464-4:00] info column score."""Iris-setosa""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.464-4:00] info column score."""Iris-versicolor""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.464-4:00] info column score."""Iris-virginica""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T22:31:40.464-4:00] info column maxLabel used 4016 bytes at 55.7778 per row with MLDB::DirectFrozenColumn
Direct memusage is 523 for 3 entries at 73 per entry
TabularDataset [2021-08-23T22:31:40.464-4:00] info column label used 4016 bytes at 55.7778 per row with MLDB::DirectFrozenColumn
TabularDataset [2021-08-23T22:31:40.464-4:00] info column weight used 1776 bytes at 24.6667 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.464-4:00] info row index usage is 4784 bytes at 66.4444 per row
TabularDataset [2021-08-23T22:31:40.464-4:00] info total mem usage is 24760 bytes for 72 rows and 6 columns for 343.889 bytes/row
TabularDataset [2021-08-23T22:31:40.464-4:00] info column memory is 15304
ExperimentProcedure [2021-08-23T22:31:40.464-4:00] info accuracy took elapsed: [0.01s cpu, 10.0916 mticks, 0.00s wall, 1.41 cores]
ExperimentProcedure [2021-08-23T22:31:40.466-4:00] info  >>>>> Creating training procedure
ClassifierProcedure [2021-08-23T22:31:40.467-4:00] info initialized feature space in elapsed: [0.00s cpu, 0.2507 mticks, 0.00s wall, 1.47 cores]
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info extracted feature vectors in elapsed: [0.00s cpu, 2.5313 mticks, 0.00s wall, 1.11 cores]
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info merged feature vectors in elapsed: [0.00s cpu, 0.0267 mticks, 0.00s wall, 0.92 cores]
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info added feature vectors in elapsed: [0.00s cpu, 0.0384 mticks, 0.00s wall, 0.99 cores]
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info indexed training data in elapsed: [0.00s cpu, 0.0032 mticks, 0.00s wall, 0.70 cores]
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info Training with 6 features
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info equalization factor 1
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info weight for class 0 = 24
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info factor for class 0 = 0.0416667
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info weight for class 1 = 30
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info factor for class 1 = 0.0333333
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info weight for class 2 = 24
ClassifierProcedure [2021-08-23T22:31:40.468-4:00] info factor for class 2 = 0.0416667
Decision Tree: (weight = 78.00, cov = 100.00%)  0/0.333 1/0.333 2/0.333
 petal width >= 0.699999988079071 (z = 0.4444, weight = 54.00, cov = 69.23%)  1/0.500 2/0.500
     petal length >= 4.75 (z = 0.2344, weight = 28.00, cov = 35.90%)  1/0.118 2/0.882
         petal length >= 5.149999618530273 (z = 0.2442, weight = 17.00, cov = 21.79%)  2/1.000
         petal length  < 5.149999618530273 (z = 0.2442, weight = 11.00, cov = 14.10%)  1/0.314 2/0.686
             petal width >= 1.75 (z = 0.3107, weight = 8.00, cov = 10.26%)  1/0.103 2/0.897
                 petal length >= 4.850000381469727 (z = 0.1615, weight = 6.00, cov = 7.69%)  2/1.000
                 petal length  < 4.850000381469727 (z = 0.1615, weight = 2.00, cov = 2.56%)  1/0.444 2/0.556
                     sepal length >= 6.050000190734863 (z = 0.0000, weight = 1.00, cov = 1.28%)  2/1.000
                     sepal length  < 6.050000190734863 (z = 0.0000, weight = 1.00, cov = 1.28%)  1/1.000
             petal width  < 1.75 (z = 0.3107, weight = 3.00, cov = 3.85%)  1/1.000
     petal length  < 4.75 (z = 0.2344, weight = 26.00, cov = 33.33%)  1/1.000
 petal width  < 0.699999988079071 (z = 0.4444, weight = 24.00, cov = 30.77%)  0/1.000

ClassifierProcedure [2021-08-23T22:31:40.470-4:00] info trained classifier in elapsed: [0.00s cpu, 3.8807 mticks, 0.00s wall, 1.61 cores]
ClassifierProcedure [2021-08-23T22:31:40.470-4:00] info Saved classifier to file://tmp/iris_utf8.cls
ExperimentProcedure [2021-08-23T22:31:40.471-4:00] info  >>>>> Creating testing procedure
commiting 8 frozen chunks with 0 rows
freeze: maxChunkNumber 7 maxChunkIndex 12
0 possible collisions
rowIndex.memusage() = 4,784
TabularDataset [2021-08-23T22:31:40.474-4:00] info row index took elapsed: [0.00s cpu, 0.4133 mticks, 0.00s wall, 1.30 cores]
Direct memusage is 538 for 3 entries at 78 per entry
Direct memusage is 538 for 3 entries at 78 per entry
TabularDataset [2021-08-23T22:31:40.474-4:00] info row name usage is 1848 bytes at 25.6667 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.474-4:00] info timestamp usage is 2048 bytes at 28.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.474-4:00] info column score."""Iris-setosa_éç""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.474-4:00] info column score."""Iris-versicolor_éç""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.474-4:00] info column score."""Iris-virginica_éç""" used 1832 bytes at 25.4444 per row with MLDB::DoubleFrozenColumn
Direct memusage is 538 for 3 entries at 78 per entry
TabularDataset [2021-08-23T22:31:40.474-4:00] info column maxLabel used 4136 bytes at 57.4444 per row with MLDB::DirectFrozenColumn
Direct memusage is 538 for 3 entries at 78 per entry
TabularDataset [2021-08-23T22:31:40.474-4:00] info column label used 4136 bytes at 57.4444 per row with MLDB::DirectFrozenColumn
TabularDataset [2021-08-23T22:31:40.474-4:00] info column weight used 1776 bytes at 24.6667 per row with MLDB::DoubleFrozenColumn
TabularDataset [2021-08-23T22:31:40.474-4:00] info row index usage is 4784 bytes at 66.4444 per row
TabularDataset [2021-08-23T22:31:40.474-4:00] info total mem usage is 24992 bytes for 72 rows and 6 columns for 347.111 bytes/row
TabularDataset [2021-08-23T22:31:40.474-4:00] info column memory is 15544
ExperimentProcedure [2021-08-23T22:31:40.474-4:00] info accuracy took elapsed: [0.00s cpu, 8.7955 mticks, 0.00s wall, 1.24 cores]
2021-08-23 22:31:40.478 stderr 
Stdout:
ds1 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa",1.4,0.2,5.1,3.5],["2","Iris-setosa",1.4,0.2,4.9,3],["3","Iris-setosa",1.3,0.2,4.7,3.2],["4","Iris-setosa",1.5,0.2,4.6,3.1],["5","Iris-setosa",1.4,0.2,5,3.6],["6","Iris-setosa",1.7,0.4,5.4,3.9],["7","Iris-setosa",1.4,0.3,4.6,3.4],["8","Iris-setosa",1.5,0.2,5,3.4],["9","Iris-setosa",1.4,0.2,4.4,2.9],["10","Iris-setosa",1.5,0.1,4.9,3.1]]
ds2 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa_éç",1.4,0.2,5.1,3.5],["2","Iris-setosa_éç",1.4,0.2,4.9,3],["3","Iris-setosa_éç",1.3,0.2,4.7,3.2],["4","Iris-setosa_éç",1.5,0.2,4.6,3.1],["5","Iris-setosa_éç",1.4,0.2,5,3.6],["6","Iris-setosa_éç",1.7,0.4,5.4,3.9],["7","Iris-setosa_éç",1.4,0.3,4.6,3.4],["8","Iris-setosa_éç",1.5,0.2,5,3.4],["9","Iris-setosa_éç",1.4,0.2,4.4,2.9],["10","Iris-setosa_éç",1.5,0.1,4.9,3.1]]

2021-08-23 22:31:40.478 script runner plugin test_utf8_category (__main__.MLDB2134classiferUtf8Test) ... FAIL

======================================================================
FAIL: test_utf8_category (__main__.MLDB2134classiferUtf8Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "file://mldb/testing/MLDB-2143-classifier-utf8.py", line 86, in test_utf8_category
AssertionError: {'accuracy': 0.9733796296296298, 'f1Score': 0.95851131150[80 chars]72.0} != {'accuracy': 0.9645061728395061, 'f1Score': 0.94465488215[81 chars]72.0}
- {'accuracy': 0.9733796296296298,
-  'f1Score': 0.9585113115013447,
-  'precision': 0.963768115942029,
-  'recall': 0.9583333333333334,
+ {'accuracy': 0.9645061728395061,
+  'f1Score': 0.9446548821548821,
+  'precision': 0.9537037037037037,
+  'recall': 0.9444444444444444,
   'support': 72.0}

Stdout:
ds1 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa",1.4,0.2,5.1,3.5],["2","Iris-setosa",1.4,0.2,4.9,3],["3","Iris-setosa",1.3,0.2,4.7,3.2],["4","Iris-setosa",1.5,0.2,4.6,3.1],["5","Iris-setosa",1.4,0.2,5,3.6],["6","Iris-setosa",1.7,0.4,5.4,3.9],["7","Iris-setosa",1.4,0.3,4.6,3.4],["8","Iris-setosa",1.5,0.2,5,3.4],["9","Iris-setosa",1.4,0.2,4.4,2.9],["10","Iris-setosa",1.5,0.1,4.9,3.1]]
ds2 [["_rowName","label","petal length","petal width","sepal length","sepal width"],["1","Iris-setosa_éç",1.4,0.2,5.1,3.5],["2","Iris-setosa_éç",1.4,0.2,4.9,3],["3","Iris-setosa_éç",1.3,0.2,4.7,3.2],["4","Iris-setosa_éç",1.5,0.2,4.6,3.1],["5","Iris-setosa_éç",1.4,0.2,5,3.6],["6","Iris-setosa_éç",1.7,0.4,5.4,3.9],["7","Iris-setosa_éç",1.4,0.3,4.6,3.4],["8","Iris-setosa_éç",1.5,0.2,5,3.4],["9","Iris-setosa_éç",1.4,0.2,4.4,2.9],["10","Iris-setosa_éç",1.5,0.1,4.9,3.1]]

----------------------------------------------------------------------
Ran 1 test in 0.039s

FAILED (failures=1)

2021-08-23 22:31:40.479 loader 
{
    "context" : [ "Running python script" ],
    "lineNumber" : 204,
    "message" : "Test failed: <unittest.runner.TextTestResult run=1 errors=0 failures=1>",
    "scriptUri" : "/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py",
    "stack" : 
    [

        {
            "functionName" : "<module>",
            "lineNumber" : 98,
            "scriptUri" : "file://mldb/testing/MLDB-2143-classifier-utf8.py",
            "where" : "File \"file://mldb/testing/MLDB-2143-classifier-utf8.py\", line 98, in <module>"
        },

        {
            "functionName" : "run_tests",
            "lineNumber" : 204,
            "scriptUri" : "/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py",
            "where" : "File \"/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py\", line 204, in run_tests"
        }
    ],
    "type" : "mldb.mldb_wrapper.mldb_wrapper.TestSuiteFailureException",
    "where" : "File \"/Users/jeremy/projects/mldb/build/x86_64/bin/mldb/mldb_wrapper.py\", line 204, in run_tests"
}

exception in accept: Operation canceled
exception in accept: Operation canceled
ServicePeer [2021-08-23T22:31:40.479-4:00] warning WARNING: peer mldb lost its own entry in discovery.  Letting it come back
peer mldb connection to mldb changed state to 3
peer mldb connection to mldb changed state to 3