modelop / hadrian

Implementations of the Portable Format for Analytics (PFA)
Apache License 2.0
130 stars 49 forks source link

pfachain incompatible with recursively defined types #38

Closed bmwilly closed 7 years ago

bmwilly commented 7 years ago

This occurs because of the prepended Step{}_Engine_{}_ strings that pfachain puts in.

Minimal reproducible example: download this gist and run

# taken from the tutorial https://github.com/opendatagroup/hadrian/wiki/Basic-random-forest
python ex.py  

# modified random forest example using pfachain
python ex_chain.py

The second script fails with

➜  python ex_chain.py
Traceback (most recent call last):
  File "ex_chain.py", line 158, in <module>
    engine, = PFAEngine.fromJson(pfaDocument)
  File "/Users/brandon/miniconda2/envs/arena/lib/python2.7/site-packages/titus/genpy.py", line 1565, in fromJson
    return PFAEngine.fromAst(titus.reader.jsonToAst(src), options, version, sharedState, multiplicity, style, debug)
  File "/Users/brandon/miniconda2/envs/arena/lib/python2.7/site-packages/titus/genpy.py", line 1509, in fromAst
    value = titus.datatype.jsonDecoder(cellConfig.avroType, cellConfig.initJsonNode)
  File "/Users/brandon/miniconda2/envs/arena/lib/python2.7/site-packages/titus/datatype.py", line 925, in jsonDecoder
    return [jsonDecoder(avroType.items, x) for x in value]
  File "/Users/brandon/miniconda2/envs/arena/lib/python2.7/site-packages/titus/datatype.py", line 934, in jsonDecoder
    out[field.name] = jsonDecoder(field.avroType, value[field.name])
  File "/Users/brandon/miniconda2/envs/arena/lib/python2.7/site-packages/titus/datatype.py", line 953, in jsonDecoder
    raise titus.errors.AvroException("{0} does not match schema {1}".format(json.dumps(value), ts(avroType)))
titus.errors.AvroException: {"TreeNode": {"operator": "<", "field": "petal_length", "fail": {"TreeNode": {"operator": "<", "field": "sepal_length", "fail": {"TreeNode": {"operator": "<", "field": "petal_width", "fail": {"TreeNode": {"operator": "<", "field": "petal_width", "fail": {"TreeNode": {"operator": "<", "field": "sepal_length", "fail": {"string": "versicolor"}, "value": 0.0, "pass": {"string": "virginica"}}}, "value": 1.45, "pass": {"TreeNode": {"operator": "<", "field": "sepal_length", "fail": {"string": "versicolor"}, "value": 0.0, "pass": {"string": "virginica"}}}}}, "value": 1.35, "pass": {"TreeNode": {"operator": "<", "field": "sepal_width", "fail": {"TreeNode": {"operator": "<", "field": "petal_width", "fail": {"string": "versicolor"}, "value": 0.15000000000000002, "pass": {"string": "setosa"}}}, "value": 2.45, "pass": {"TreeNode": {"operator": "<", "field": "petal_width", "fail": {"string": "versicolor"}, "value": 1.05, "pass": {"string": "versicolor"}}}}}}}, "value": 5.05, "pass": {"TreeNode": {"operator": "<", "field": "petal_length", "fail": {"TreeNode": {"operator": "<", "field": "sepal_width", "fail": {"TreeNode": {"operator": "<", "field": "sepal_width", "fail": {"string": "setosa"}, "value": 3.45, "pass": {"string": "setosa"}}}, "value": 3.3, "pass": {"TreeNode": {"operator": "<", "field": "petal_length", "fail": {"string": "versicolor"}, "value": 3.4, "pass": {"string": "setosa"}}}}}, "value": 1.55, "pass": {"TreeNode": {"operator": "<", "field": "petal_width", "fail": {"TreeNode": {"operator": "<", "field": "sepal_width", "fail": {"string": "setosa"}, "value": 3.25, "pass": {"string": "setosa"}}}, "value": 0.15000000000000002, "pass": {"TreeNode": {"operator": "<", "field": "sepal_width", "fail": {"string": "setosa"}, "value": 0.0, "pass": {"string": "virginica"}}}}}}}}}, "value": 1.45, "pass": {"TreeNode": {"operator": "<", "field": "sepal_length", "fail": {"TreeNode": {"operator": "<", "field": "petal_length", "fail": {"TreeNode": {"operator": "<", "field": "sepal_length", "fail": {"TreeNode": {"operator": "<", "field": "petal_width", "fail": {"string": "setosa"}, "value": 0.15000000000000002, "pass": {"string": "setosa"}}}, "value": 4.45, "pass": {"TreeNode": {"operator": "<", "field": "sepal_length", "fail": {"string": "setosa"}, "value": 0.0, "pass": {"string": "virginica"}}}}}, "value": 1.1, "pass": {"TreeNode": {"operator": "<", "field": "petal_length", "fail": {"TreeNode": {"operator": "<", "field": "sepal_width", "fail": {"string": "setosa"}, "value": 0.0, "pass": {"string": "virginica"}}}, "value": 0.0, "pass": {"TreeNode": {"operator": "<", "field": "sepal_length", "fail": {"string": "virginica"}, "value": 0.0, "pass": {"string": "virginica"}}}}}}}, "value": 4.35, "pass": {"TreeNode": {"operator": "<", "field": "petal_length", "fail": {"TreeNode": {"operator": "<", "field": "petal_length", "fail": {"TreeNode": {"operator": "<", "field": "sepal_length", "fail": {"string": "setosa"}, "value": 0.0, "pass": {"string": "virginica"}}}, "value": 0.0, "pass": {"TreeNode": {"operator": "<", "field": "petal_length", "fail": {"string": "virginica"}, "value": 0.0, "pass": {"string": "virginica"}}}}}, "value": 0.0, "pass": {"TreeNode": {"operator": "<", "field": "sepal_width", "fail": {"TreeNode": {"operator": "<", "field": "sepal_width", "fail": {"string": "virginica"}, "value": 0.0, "pass": {"string": "virginica"}}}, "value": 0.0, "pass": {"TreeNode": {"operator": "<", "field": "sepal_length", "fail": {"string": "virginica"}, "value": 0.0, "pass": {"string": "virginica"}}}}}}}}}}} does not match schema
    union(string,
          record(Step2_Engine_2_TreeNode,
                 field: enum([sepal_length, sepal_width, petal_length, petal_width], Step2_Engine_2_Enum_1),
                 operator: string,
                 value: double,
                 pass: union(string,
                             Step2_Engine_2_TreeNode),
                 fail: union(string,
                             Step2_Engine_2_TreeNode)))

Related issue: #30

bmwilly commented 7 years ago

This isn't a problem if you initialize cells before chaining.