manojkarthick / pqrs

Command line tool for inspecting Parquet files
Apache License 2.0
283 stars 27 forks source link

ParquetError(ArrowError("creating ListArrayReader with type FixedSizeList ... should be unreachable #39

Open AlJohri opened 1 year ago

AlJohri commented 1 year ago

hi there, ran into a new error today. I'm guessing it might have to do with the fact that the inner list field is called item instead of element?

❯ pqrs head --json output.parquet
Error: ParquetError(ArrowError("creating ListArrayReader with type FixedSizeList(Field { name: \"item\", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, 256) should be unreachable"))

❯ pqrs schema output.parquet 
Metadata for file: output.parquet trying to cat a parquet. I'm guessing it might have something to do with the 

version: 1
num of rows: 30
created by: parquet-cpp-arrow version 6.0.1
metadata:
  ARROW:schema: /////4gBAAAQAAAAAAAKAAwABgAFAAgACgAAAAABBAAMAAAACAAIAAAABAAIAAAABAAAAAUAAAAoAQAA7AAAAMAAAACAAAAABAAAAPz+//8AAAAQFAAAACQAAAAEAAAAAQAAADAAAAAJAAAAZW1iZWRkaW5nAAYACAAEAAYAAAAAAQAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEDEAAAABwAAAAEAAAAAAAAAAQAAABpdGVtAAAGAAgABgAGAAAAAAABAHT///8AAAACEAAAACQAAAAEAAAAAAAAAAgAAABzaGFyZF9pZAAAAAAIAAwACAAHAAgAAAAAAAABIAAAALD///8AAAAFEAAAABgAAAAEAAAAAAAAAAQAAAB0ZXh0AAAAAJz////Y////AAAABRAAAAAYAAAABAAAAAAAAAAFAAAAdGl0bGUAAADE////EAAUAAgAAAAHAAwAAAAQABAAAAAAAAAFEAAAACAAAAAEAAAAAAAAAAoAAABwYXNzYWdlX2lkAAAEAAQABAAAAA==
message schema {
  REQUIRED BYTE_ARRAY passage_id (STRING);
  REQUIRED BYTE_ARRAY title (STRING);
  REQUIRED BYTE_ARRAY text (STRING);
  REQUIRED INT32 shard_id;
  REQUIRED group embedding (LIST) {
    REPEATED group list {
      OPTIONAL FLOAT item;
    }
  }
}

EDIT: I tried changing the name from item to element via pyarrow's use_compliant_nested_type=True but that didn't fix the issue so it might be something else.

AlJohri commented 1 year ago

I figured out the issue! It has to do with --json flag:

❯ pqrs head -n1 output.parquet  
{passage_id: "asdjfaslkdfasf", title: "adsfasdf", text: "asdfasdfadf", shard_id: 0, embedding: [1,2,3,4]}

❯ pqrs head -n1 --json output.parquet
Error: ParquetError(ArrowError("creating ListArrayReader with type FixedSizeList(Field { name: \"item\", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, 256) should be unreachable"))