junxnone / aiwiki

AI Wiki
https://junxnone.github.io/aiwiki
18 stars 2 forks source link

Tools Pytorch Data #359

Open junxnone opened 3 years ago

junxnone commented 3 years ago

Reference

Brief

图片放置方式 Methods
自带数据集 torchvision.datasets 自动下载
每个类别图片在一个文件夹 torchvision.datasets.ImageFolder 从 目录读取
所有图片都在一个文件夹,图片label 为 csv 继承 torch.utils.data.Dataset 创建 custom dataset

torchvision.datasets.ImageFolder

idatasets = {x: torchvision.datasets.ImageFolder(
                    os.path.join(data_dir, x),
                    data_transforms[x]) 
                    for x in ['train', 'val']}

idataloders = {x: torch.utils.data.DataLoader(idatasets[x],     
                                            batch_size=4, 
                                            shuffle=True,
                                            num_workers=4) 
                                            for x in ['train', 'val']}

torch.utils.data.Dataset

- **Augmentation**
```python
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Scale(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

idataloders = {x: torch.utils.data.DataLoader(idatasets[x],
batch_size=4, shuffle=True, num_workers=4) for x in ['train', 'val']}



## 加速提取Data方式
- 更多 workers, 增加 dataloader 线程
- `数据读取预处理` 与 `训练` 分离进程
- LMDB
- `opencv` > `PIL`
- `bmp` 减少解码时间
- TFRecord/RecordIO/hdf5/pth/n5