libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.84k stars 178 forks source link

Different resulted file sizes when using DatasetWriter on the same dataset #216

Open netw0rkf10w opened 2 years ago

netw0rkf10w commented 2 years ago

Today I encountered some strange behavior of FFCV and I would like to ask if that's actually normal.

I launched the following command from ffcv-imagenet on two different machines:

bash write_imagenet.sh 500 1.0 90

where in write_imagenet.sh I set --cfg.write_mode=jpg.

Surprisingly I obtained two train_500_1.0_90.ffcv with different sizes, one has 65GB and the other 64GB (more precisely, 68725058920 bytes and 68708281704 bytes). Why is that?

For your information, the two machines differs in terms of memory and processors (one with 3 CPUs and the other with 10). The output logs are appended below.

Thanks in advance for your reply!

Writing ImageNet train dataset to /home/data/imagenet_ffcv_jpg_p13/train_500_1.0_90.ffcv
┌ Arguments defined────────┬─────────────────────────────────────────────────────────────────────────────┐
│ Parameter                │ Value                                                                       │
├──────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ cfg.dataset              │ imagenet                                                                    │
│ cfg.split                │ train                                                                       │
│ cfg.data_dir             │ /home/data/imagenet/train                                                   │
│ cfg.write_path           │ /home/data/imagenet_ffcv_jpg_p13/train_500_1.0_90.ffcv                      │
│ cfg.write_mode           │ jpg                                                                         │
│ cfg.max_resolution       │ 500                                                                         │
│ cfg.num_workers          │ 10                                                                          │
│ cfg.chunk_size           │ 100                                                                         │
│ cfg.jpeg_quality         │ 90.0                                                                        │
│ cfg.subset               │ -1                                                                          │
│ cfg.compress_probability │ 1.0                                                                         │
└──────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘
100%|██████████| 1281167/1281167 [6:25:38<00:00, 55.37it/s]  
Writing ImageNet val dataset to /home/data/imagenet_ffcv_jpg_p13/val_500_1.0_90.ffcv
┌ Arguments defined────────┬───────────────────────────────────────────────────────────────────────────┐
│ Parameter                │ Value                                                                     │
├──────────────────────────┼───────────────────────────────────────────────────────────────────────────┤
│ cfg.dataset              │ imagenet                                                                  │
│ cfg.split                │ val                                                                       │
│ cfg.data_dir             │ /home/data/imagenet/val                                                   │
│ cfg.write_path           │ /home/data/imagenet_ffcv_jpg_p13/val_500_1.0_90.ffcv                      │
│ cfg.write_mode           │ jpg                                                                       │
│ cfg.max_resolution       │ 500                                                                       │
│ cfg.num_workers          │ 10                                                                        │
│ cfg.chunk_size           │ 100                                                                       │
│ cfg.jpeg_quality         │ 90.0                                                                      │
│ cfg.subset               │ -1                                                                        │
│ cfg.compress_probability │ 1.0                                                                       │
└──────────────────────────┴───────────────────────────────────────────────────────────────────────────┘
100%|██████████| 50000/50000 [16:34<00:00, 50.26it/s] 

Writing ImageNet train dataset to /home/data/imagenet_ffcv_jpg/train_500_1.0_90.ffcv
┌ Arguments defined────────┬─────────────────────────────────────────────────────────────────────────┐
│ Parameter                │ Value                                                                   │
├──────────────────────────┼─────────────────────────────────────────────────────────────────────────┤
│ cfg.dataset              │ imagenet                                                                │
│ cfg.split                │ train                                                                   │
│ cfg.data_dir             │ /home/data/imagenet/train                                               │
│ cfg.write_path           │ /home/data/imagenet_ffcv_jpg/train_500_1.0_90.ffcv                      │
│ cfg.write_mode           │ jpg                                                                     │
│ cfg.max_resolution       │ 500                                                                     │
│ cfg.num_workers          │ 3                                                                       │
│ cfg.chunk_size           │ 100                                                                     │
│ cfg.jpeg_quality         │ 90.0                                                                    │
│ cfg.subset               │ -1                                                                      │
│ cfg.compress_probability │ 1.0                                                                     │
└──────────────────────────┴─────────────────────────────────────────────────────────────────────────┘
100%|██████████| 1281167/1281167 [5:17:42<00:00, 67.21it/s]  
Writing ImageNet val dataset to /home/data/imagenet_ffcv_jpg/val_500_1.0_90.ffcv
┌ Arguments defined────────┬───────────────────────────────────────────────────────────────────────┐
│ Parameter                │ Value                                                                 │
├──────────────────────────┼───────────────────────────────────────────────────────────────────────┤
│ cfg.dataset              │ imagenet                                                              │
│ cfg.split                │ val                                                                   │
│ cfg.data_dir             │ /home/data/imagenet/val                                               │
│ cfg.write_path           │ /home/data/imagenet_ffcv_jpg/val_500_1.0_90.ffcv                      │
│ cfg.write_mode           │ jpg                                                                   │
│ cfg.max_resolution       │ 500                                                                   │
│ cfg.num_workers          │ 10                                                                    │
│ cfg.chunk_size           │ 100                                                                   │
│ cfg.jpeg_quality         │ 90.0                                                                  │
│ cfg.subset               │ -1                                                                    │
│ cfg.compress_probability │ 1.0                                                                   │
└──────────────────────────┴───────────────────────────────────────────────────────────────────────┘
100%|██████████| 50000/50000 [13:13<00:00, 63.00it/s]