Closed hinxx closed 1 year ago
Technically, yes, uniontype
is supported, but limited.
The module tries to cast the field value to one of the container types (in your example int
and double
), and if it fails it tries to convert it to the next type until it succeed or no possible type remains (raising an exception in that case).
Because you only write integers in your example, every value will be an int (tag: 0, the first container type). If you use float values explicitly:
>>> fp = open("./new_data-6.orc", "wb")
>>> writer1 = pyorc.Writer(fp, "struct<col1:uniontype<int,double>>")
>>> writer1.write((0,))
>>> writer1.write((1.0,))
>>> writer1.write((22.0,))
>>> writer1.write((33,))
>>> writer1.write((0,))
>>> writer1.write((1,))
>>> writer1.close()
>>> fp.close()
Then you can see values with tag: 1:
$ orc-contents ./new_data-6.orc
{"col1": {"tag": 0, "value": 0}}
{"col1": {"tag": 1, "value": 1}}
{"col1": {"tag": 1, "value": 22}}
{"col1": {"tag": 0, "value": 33}}
{"col1": {"tag": 0, "value": 0}}
{"col1": {"tag": 0, "value": 1}}
(Side note: you pass tuples with more than one item to the write method in your example, but because your schema only has one column, the rest of the items in the tuple will be thrown away.)
One particular downside of this dynamic casting mechanism that it depends on the order of container type definition. For example if you wrote this schema:
>>> fp = open("./new_data-6.orc", "wb")
>>> writer1 = pyorc.Writer(fp, "struct<col1:uniontype<double,int>>")
>>> writer1.write((0,))
>>> writer1.write((1.0,))
>>> writer1.write((22.0,))
>>> writer1.write((33,))
>>> writer1.close()
>>> fp.close()
You wouldn't be able to write anything as an int (tag: 1), because every Python integer is also a valid float object:
$ orc-contents ./new_data-6.orc
{"col1": {"tag": 0, "value": 0}}
{"col1": {"tag": 0, "value": 1}}
{"col1": {"tag": 0, "value": 22}}
{"col1": {"tag": 0, "value": 33}}
Thanks for explaining!
I'm trying to use
uniontype
type. Is this supported?I can write data but the contents of the file looks wrong when inspected with
orc-contents
. Only tag 0 has values and tag 1 is always empty..Schema looks OK: