manojkarthick / pqrs

Command line tool for inspecting Parquet files
Apache License 2.0
294 stars 29 forks source link

Cannot show parquet file #38

Open Hoeze opened 1 year ago

Hoeze commented 1 year ago

Hi @manojkarthick , I am trying to open the attached file but it fails with the following error:

# pqrs --version
pqrs 0.2.2
# pqrs cat test.parquet

##################
File: test.parquet
##################

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: General("insufficient values read from column - expected: 1024, got: 0")', /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-12.0.0/src/record/reader.rs:578:36
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The same works with pyarrow:

import pyarrow as pa
import pyarrow.parquet as pq

pq.read_table("test.parquet")
Out[3]: 
pyarrow.Table
chromosome: string not null
position: int32 not null
identifier: list<id: string not null> not null
  child 0, id: string not null
reference: string not null
alternate: list<alternate: string not null> not null
  child 0, alternate: string not null
quality: float
filter: list<filter: string not null> not null
  child 0, filter: string not null
info_END: int32 not null
info_SVTYPE: string not null
----
chromosome: [["chr1","chr1","chr1","chr1","chr1","chr1","chr1","chr1","chr1","chr1",...,"chr1","chr1","chr1","chr1","chr1","chr1","chr1","chr1","chr1","chr1"]]
position: [[10427,10427,10439,10440,13459,14397,15219,16766,16871,29231,...,6094404,6094410,6094858,6095109,6095224,6095265,6095278,6095299,6095300,6095491]]
identifier: [[["chr1:10426:10429:ACC>A"],["chr1:10426:10429:ACC>*"],["chr1:10438:10440:AC>*"],["chr1:10439:10440:C>*"],["chr1:13458:13462:CAGA>C"],["chr1:14396:14399:CTG>C"],["chr1:15218:15230:GAGCCACCTCCC>G"],["chr1:16765:16766:C>CT"],["chr1:16870:16872:GC>G"],["chr1:29230:29231:G>T"],...,["chr1:6094403:6094404:C>A"],["chr1:6094409:6094410:C>T"],["chr1:6094857:6094858:C>T"],["chr1:6095108:6095109:G>A"],["chr1:6095223:6095224:C>T"],["chr1:6095264:6095265:G>A"],["chr1:6095277:6095278:C>T"],["chr1:6095298:6095299:C>T"],["chr1:6095299:6095300:G>A"],["chr1:6095490:6095491:C>G"]]]
reference: [["ACC","ACC","AC","C","CAGA","CTG","GAGCCACCTCCC","C","GC","G",...,"C","C","C","G","C","G","C","C","G","C"]]
alternate: [[["A"],["*"],["*"],["*"],["C"],["C"],["G"],["CT"],["G"],["T"],...,["A"],["T"],["T"],["A"],["T"],["A"],["T"],["T"],["A"],["G"]]]
quality: [[null,null,null,null,null,null,null,null,null,null,...,null,null,null,null,null,null,null,null,null,null]]
filter: [[[""],[""],[""],[""],[""],[""],[""],[""],[""],[""],...,[""],[""],[""],[""],[""],[""],[""],[""],[""],[""]]]
info_END: [[10429,10429,10440,10440,13462,14399,15230,16766,16872,29231,...,6094404,6094410,6094858,6095109,6095224,6095265,6095278,6095299,6095300,6095491]]
info_SVTYPE: [["","","","","","","","","","",...,"","","","","","","","","",""]]

Would you mind having a look to find out why?

test.parquet.zip