Closed LucieContamin closed 2 weeks ago
Discussed at retreat and decided this should be supported. The parquet compression options available are: "snappy", "gzip", "brotli", "zstd", "lz4", "lzo" and "bz2". See https://arrow.apache.org/docs/r/reference/write_parquet.html
It seems that when
arrow
is writing a compressed parquet file the compression can or cannot be included in the filename, both will work. For example:arrow::write_parquet(df, "model-output/JHU_UNC-flepiMoP/2024-04-28-JHU_UNC-flepiMoP.gzip.parquet", compression = "gzip", compression_level = 9)
andarrow::write_parquet(df, "model-output/JHU_UNC-flepiMoP/2024-04-28-JHU_UNC-flepiMoP.parquet", compression = "gzip", compression_level = 9)
, both returns the same file with the same content and is possible to read the files with the same arrow function call.However, when using the
validate_submission()
, if we use a filename with the compression information, it will returns an error and not validate the files: