sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

A column can no longer be closed after get_typed_column_writer is used #198

Open xrl opened 5 years ago

xrl commented 5 years ago

I was trying to implement ColumnWriter variant-agnostic code for #197 and I could not close the column with RowGroupWriter. Looks like once you "upcast" the ColumnWriter -> ColumnWriterImpl you can't close it, so perhaps the get_typed_column_writer is only useful for tests?

Here's a snippet of the failing code:

// assume you have a row_group_writer
let mut column_writer = row_group_writer.next_column().unwrap().unwrap();
let mut typed_writer : parquet::column::writer::ColumnWriterImpl<#typed_writer_types> = parquet::column::writer::get_typed_column_writer(column_writer);
typed_writer.write_batch(&vals[..], None, None).unwrap();
typed_writer.close().unwrap(); // doesn't stop row_group_writer from complaining a column is still open
row_group_writer.close_column(typed_writer).unwrap(); // doesn't compile
sadikovi commented 5 years ago

What do you mean by upcast? Those are completely different structs. Could you elaborate on this comment?

typed_writer.close().unwrap(); // doesn't stop row_group_writer from complaining a column is still open

Unfortunately, row group writer only works with column writer, not column writer impl. Closing the latter would not be seen by the former, because there is no way to close column writer impl. We can add a method to close column writer impl, or, as an alternative, we can mark get_typed_column_writer private - it is safer to use pattern match.

xrl commented 5 years ago

Those are good points. I vote for turning get_typed_column_writer private.

xrl commented 5 years ago

Also, the comment for get_typed_column_writer is from the docs for the function:

/// Gets a typed column writer for the specific type `T`, by "up-casting" `col_writer` of
/// non-generic type to a generic column writer type `ColumnWriterImpl`.
///
/// NOTE: the caller MUST guarantee that the actual enum value for `col_writer` matches
/// the type `T`. Otherwise, disastrous consequence could happen.
sadikovi commented 5 years ago

If your code needs to have that method, we can keep it public and fix the code around it. The easiest would be making it private with it's counterpart in column reader.