Open ZJONSSON opened 6 years ago
The problem seems to be with how the reader reconstructs the schema from the parquet file.
If I log the original fields from the schema (i.e. console.log(schema.fields)
), I get:
{
"a": {
"name": "a",
"path": [
"a"
],
"repetitionType": "REQUIRED",
"rLevelMax": 0,
"dLevelMax": 0,
"isNested": true,
"fieldCount": 1,
"fields": {
"b": {
"name": "b",
"path": [
"a",
"b"
],
"repetitionType": "REQUIRED",
"rLevelMax": 0,
"dLevelMax": 0,
"isNested": true,
"fieldCount": 1,
"fields": {
"c": {
"name": "c",
"path": [
"a",
"b",
"c"
],
"repetitionType": "REQUIRED",
"rLevelMax": 0,
"dLevelMax": 0,
"isNested": true,
"fieldCount": 1,
"fields": {
"d": {
"name": "d",
"primitiveType": "BYTE_ARRAY",
"originalType": "UTF8",
"path": [
"a",
"b",
"c",
"d"
],
"repetitionType": "REQUIRED",
"encoding": "PLAIN",
"compression": "UNCOMPRESSED",
"rLevelMax": 0,
"dLevelMax": 0
...
However if I look at the schema created by the reader (i.e. console.log(reader.schema.fields)
) I get:
{
"a": {
"name": "a",
"path": [
"a"
],
"repetitionType": "REQUIRED",
"rLevelMax": 0,
"dLevelMax": 0,
"isNested": true,
"fieldCount": 1,
"fields": {
"b": {
"name": "b",
"path": [
"a",
"b"
],
"repetitionType": "REQUIRED",
"rLevelMax": 0,
"dLevelMax": 0,
"isNested": true,
"fieldCount": 0,
"fields": {}
}
}
},
"c": {
"name": "c",
"path": [
"c"
],
"repetitionType": "REQUIRED",
"rLevelMax": 0,
"dLevelMax": 0,
"isNested": true,
"fieldCount": 1,
"fields": {
"d": {
"name": "d",
"primitiveType": "BYTE_ARRAY",
"originalType": "UTF8",
"path": [
"c",
"d"
],
"repetitionType": "REQUIRED",
"encoding": "PLAIN",
"compression": "UNCOMPRESSED",
"rLevelMax": 0,
"dLevelMax": 0
}
}
}
}
same here for me.
As a note this error is only happening when you read the whole row
cursor.next()
If you pass in the columns you want this error doesn't happen
Here is an example of a schema that is three levels deep. Shreading and Materializing a single record works fine however writing a parquet file and reading it back results in an error:
Output is: