ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
3.82k stars 530 forks source link

Parquet file import error #6826

Open SloNN opened 2 months ago

SloNN commented 2 months ago

Making a request but immediately receive the error

ydb -e grpc://c0fbrdjnem92d2p2q5jn.cluster.testing.ydb.yandex.net:2135 -d /olap-testing-vla-common2/kikimr/apkobzev/test import file parquet ACCOUNT_TYPE.parquet -p gb/ACCOUNT_TYPE

Status: GENERIC_ERROR
Issues:
<main>: Error: Cannot write data into shard 72075186224037987 in longTx ydb://long-tx/01j32z26tadef79adwwgejn4yd?node_id=50007

trying to load file ACCOUNT_TYPE.parquet that is attached to the issue. Table structure is:

image account_type.tar.gz

iddqdex commented 1 month ago

Паркет не соответствует схеме, у него все поля nulleable, а ACCOUNT_TYPE_ID_INT не должен:

[
  {
    "PrimitiveType": {
      "field_info": {
        "name": "ACCOUNT_TYPE_ID_INT",
        "repetition": "Optional",
        "id": null
      },
      "logical_type": null,
      "converted_type": null,
      "physical_type": "Int32"
    }
  },
  {
    "PrimitiveType": {
      "field_info": {
        "name": "ACCOUNT_TYPE_ID_CHAR",
        "repetition": "Optional",
        "id": null
      },
      "logical_type": "String",
      "converted_type": "Utf8",
      "physical_type": "ByteArray"
    }
  },
  {
    "PrimitiveType": {
      "field_info": {
        "name": "NAME",
        "repetition": "Optional",
        "id": null
      },
      "logical_type": "String",
      "converted_type": "Utf8",
      "physical_type": "ByteArray"
    }
  }
]

@SloNN

SloNN commented 1 month ago

Recreated Parquet files, created new one with all Fields marked as not null

 pqi schema ACCOUNT_TYPE.parquet
ACCOUNT_TYPE_ID_INT: int32 not null
ACCOUNT_TYPE_ID_CHAR: string not null
NAME: string not null

But I receive the same error

ydb -e grpc://c0fbrdjnem92d2p2q5jn.cluster.testing.ydb.yandex.net:2135 -d /olap-testing-vla-common2/kikimr/apkobzev/test import file parquet ACCOUNT_TYPE.parquet -p gb/ACCOUNT_TYPE

Status: GENERIC_ERROR
Issues:
<main>: Error: Cannot write data into shard 72075186224037975 in longTx ydb://long-tx/01j3fgfsx43fy49nnn861yyyq0?node_id=50001

ACCOUNT_TYPE-2.parquet.gz

@iddqdex