Open journey-wang opened 2 years ago
Tried reproducing the issue in this PR: https://github.com/uber/petastorm/pull/749
Got:
Parquet size 89105.625 KB
png file size: 88.3056640625 KB
Size per parquet row: 89.105625 KB
I.e. size of the parquet store matches the expectation - no significant overhead is observed.
Hi Everyone,
I've stored 899 images (about 48MB) into petastorm parquet. But I've got almost 240MB parquet files. Please help to figure out why the parquet files are so big and how to reduce the size ?
The code I used from https://github.com/uber/petastorm/issues/497
root@br1609hpc30:~# find flower_photos/dandelion/|wc -l 899 root@br1609hpc30:~# du -sh flower_photos/dandelion/ 48M flower_photos/dandelion/ root@br1609hpc30:~# du -sh /tmp/petastorm_ingest_test/ 240M /tmp/petastorm_ingest_test/
Best regards.