Open CalderWhite opened 1 year ago
There is an issue when running pqrs cat --csv [infile] without timestamp objects
Do you mean with timestamps?
For me they are all set to 1970-01-01. But when I cat with json, it is fine. parquet-tools can convert it to csv fine.
Would you like to provide the data the causes this bug? Specifically, the type of this timestamp field and its value.
I try to reproduce this issue, I created a parquet file with the following code:
use datafusion::{
arrow::{
array::TimestampSecondArray,
datatypes::{DataType, Field, Schema, TimeUnit},
record_batch::RecordBatch,
},
parquet::arrow::ArrowWriter,
};
use std::{fs::OpenOptions, sync::Arc};
#[tokio::main(flavor = "current_thread")]
async fn main() {
let schema = Arc::new(Schema::new(vec![Field::new(
"timestamp",
DataType::Timestamp(TimeUnit::Second, None),
true,
)]));
let timestamp_column = Arc::new(TimestampSecondArray::from(vec![1709090622]));
let batch = RecordBatch::try_new(Arc::clone(&schema), vec![timestamp_column]).unwrap();
let file = OpenOptions::new()
.write(true)
.create(true)
.open("test.parquet")
.unwrap();
let mut writer = ArrowWriter::try_new(file, Arc::clone(&schema), None).unwrap();
writer.write(&batch).unwrap();
writer.close().unwrap();
}
But as you can see, the timestamp was successfully printed:
$ cargo r -q
$ l test.parquet
Permissions Links Size User Group Date Modified Name
.rw-r--r--@ 1 580 steve steve 28 Feb 11:24 test.parquet
$ pqrs --version
pqrs 0.3.1
$ pqrs cat test.parquet
##################
File: test.parquet
##################
{timestamp: 1709090622}
$ pqrs cat --csv test.parquet
##################
File: test.parquet
##################
timestamp
2024-02-28T03:23:42.000000000
There is an issue when running
pqrs cat --csv [infile]
without timestamp objects. For me they are all set to 1970-01-01. But when I cat with json, it is fine. parquet-tools can convert it to csv fine.I suspect something with an integer overflow?