Closed hombit closed 1 year ago
example how to convert initial csv file to parquet:
from utilits import save_to_parquet
save_to_parquet(
path_to_file='lc_1M.csv',
columns_list=['mjd', 'mag', 'magerr'],
path_parquet_file='lc_1M.parquet'
)
# read by pandas
data = pd.read_parquet('lc_1M.parquet')
Converted file: parquet negative class file https://disk.yandex.ru/d/jdCkjWjn8YV2og parquet negative class file https://disk.yandex.ru/d/SAncgojR-ehb2g
Thank you! Parquet support per-column compression, so I suggest to store parquet files as is, without additional zipping
Ah, gzip compression is for columns, not for the whole file, sorry for mis-understanding. Is it better than default zstd? Could you please use .parquet extension for new files?
Found that the 'gzip' compression method converts the np.array as a string again. The default compression method works correctly, np.array save as np.array. So I changed compression's method to default. Corrected the file extension.
Use parquet as a format for all input and output files