Open ozgrakkurt opened 1 year ago
Hey! @jorgecarleitao can you give guidence on this? I started doing it. What I come up with is something like this:
/// Creates a bloom filter from the bitset and writes it into the `writer`.
pub fn write<R: Write + Seek>(
column_metadata: &mut ColumnChunkMetaData,
mut writer: &mut W,
bitset: &[u8],
) -> Result<(), Error> {
// create bloom filter header
// create TCompactInputProtocol containing the bloom filter
// write the offset to column_metadata
// write the bloom filter to the writer
}
does it look correct?
edit: actually I found that is should be something like this:
/// Creates a bloom filter from the bitset and writes it into the `writer`.
pub fn write(
protocol: &mut TCompactOutputProtocol,
bitset: &[u8],
) -> Result<(), Error> {
// create bloom filter header
// create TCompactInputProtocol containing the bloom filter
// write the offset to column_metadata
// write the bloom filter to the protocol
}
This pr mentions this requires big changes: https://github.com/jorgecarleitao/parquet2/pull/99. But this seems like a feature that is important to implement for performance. How doable is it in the current state of the library? I would like to work on it if possible