yizhiwang96 / deepvecfont

[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning
MIT License
182 stars 31 forks source link

train the image super-resolution model error and train the image super-resolution model error #19

Closed graphl closed 1 year ago

graphl commented 2 years ago

手顺

使用命令将ttf/otf文件做成数据集

使用命令:

> cd data_utils

> fontforge -lang=py -script convert_ttf_to_sfd_mp.py --split train  
> fontforge -lang=py -script convert_ttf_to_sfd_mp.py --split test

> python write_glyph_imgs.py --split train  
> python write_glyph_imgs.py --split test  

> python write_data_to_dirs.py --split train
> python write_data_to_dirs.py --split test

步骤2:train the neural rasterizer:

使用命令:

发生下面错误:

Training on experiment dvf_neural_raster...
Finished loading train paths
Traceback (most recent call last):
  File "/home/ubuntu/deepvecfontnew/train_nr.py", line 231, in <module>
    main()
  File "/home/ubuntu/deepvecfontnew/train_nr.py", line 223, in main
    train(opts)
  File "/home/ubuntu/deepvecfontnew/train_nr.py", line 191, in train
    train_nr_model(opts)
  File "/home/ubuntu/deepvecfontnew/train_nr.py", line 29, in train_nr_model
    train_loader = get_loader(opts.data_root, opts.image_size, opts.char_categories, opts.max_seq_len, opts.seq_feature_dim, opts.batch_size, opts.read_mode, opts.mode)
  File "/home/ubuntu/deepvecfontnew/dataloader.py", line 69, in get_loader
    dataloader = data.DataLoader(dataset, batch_size, shuffle=(mode == 'train'), num_workers=batch_size, drop_last=True)
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 347, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-type]
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/sampler.py", line 107, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

修改代码

将文件 dataloader.py 69的shuffle=(mode == 'train')修改为shuffle=False 继续使用命令:python train_nr.py --mode train --experiment_name dvf --model_name neural_raster

在train_nr.py 的57行加入 print(len(train_loader))

输出一下log:
```
Training on experiment dvf_neural_raster...
Finished loading train paths
/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/dataloader.py:557:     UserWarning: This DataLoader will create 64 worker processes in total. Our suggested max number of  worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be    aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker     number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Finished loading test paths
/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torchvision/models/_utils.py:208:   UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use    'weights' instead.
  warnings.warn(
/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torchvision/models/_utils.py:223:   UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and   will be removed in 0.15. The current behavior is equivalent to passing `weights=VGG19_Weights.  IMAGENET1K_V1`. You can also use `weights=VGG19_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

0

0
0
0
0
0
0
0
```

显示len(train_loader)的值为0?这是因为生成的数据有问题吗?还是因为我修改的代码导致的原因。

步骤3. train the image super-resolution model:

使用命令: python train_sr.py --mode train --name image_sr

发生下面错误:

----------------- Options ---------------
               batch_size: 16
                    beta1: 0.0
          char_categories: 52
          checkpoints_dir: ./experiments
           continue_train: False
                crop_size: 256
                 dataroot: ./data/vecfont_dataset_dirs_sr/
             dataset_mode: aligned
                direction: BtoA
              display_env: main
             display_freq: 400
               display_id: 1
            display_ncols: 4
             display_port: 8097
           display_server: http://172.31.222.102
          display_winsize: 256
                    epoch: latest
              epoch_count: 1
          experiment_name: dvf
                 gan_mode: lsgan
        gauss_temperature: 0
                  gpu_ids: 0
               image_size: 256
                   img_hr: 256
                   img_lr: 128
                init_gain: 0.02
                init_type: normal
                 input_nc: 1
                  isTrain: True                                 [default: None]
                lambda_L1: 1.0
                load_iter: 0                                    [default: 0]
                load_size: 256
                       lr: 0.002
           lr_decay_iters: 50
                lr_policy: linear
         max_dataset_size: inf
          mix_temperature: 0.0001
                     mode: train
               model_name: main_model
                 n_epochs: 500
           n_epochs_decay: 500
               n_layers_D: 3
                     name: image_sr
                      ndf: 64
                     netD: basic
                     netG: unet_256
                      ngf: 64
               no_dropout: False
                  no_flip: True
                  no_html: False
                     norm: instance
              num_threads: 4
                output_nc: 1
                    phase: train
                pool_size: 50
               preprocess: none
               print_freq: 100
                read_mode: dirs
             save_by_iter: False
          save_epoch_freq: 25
         save_latest_freq: 5000
           serial_batches: False
                   suffix:
               test_epoch: 125
         update_html_freq: 1000
                  verbose: False
----------------- End -------------------
Traceback (most recent call last):
  File "/home/ubuntu/deepvecfontnew/train_sr.py", line 12, in <module>
    dataset_train, dataset_test = create_dataset(opt)  # create a dataset given opt.dataset_mode and other options
  File "/home/ubuntu/deepvecfontnew/models/imgsr/modules.py", line 244, in create_dataset
    dataset_train = get_loader(opt.dataroot, opt.char_categories, opt.batch_size, opt.read_mode, opt.img_lr, opt.img_hr, 'train')
  File "/home/ubuntu/deepvecfontnew/dataloader_sr.py", line 63, in get_loader
    dataloader = data.DataLoader(dataset, batch_size, shuffle=(mode == 'train'), num_workers=batch_size)
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 347, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-type]
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/sampler.py", line 107, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

修改代码解决错误

将dataloader_sr.py文件63行的shuffle=(mode == 'train')=> shuffle=False

再次运行步骤4发生以下错误:

home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:131: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
    warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
    learning rate 0.0020000 -> 0.0020000
    End of epoch 1 / 1000    Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 2 / 1000    Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 3 / 1000    Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 4 / 1000    Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 5 / 1000    Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 6 / 1000    Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 7 / 1000    Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 8 / 1000    Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 9 / 1000    Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 10 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 11 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 12 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 13 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 14 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 15 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 16 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 17 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 18 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 19 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 20 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 21 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 22 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 23 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    End of epoch 24 / 1000   Time Taken: 0 sec
    learning rate 0.0020000 -> 0.0020000
    Traceback (most recent call last):
      File "/home/ubuntu/deepvecfontnew/train_sr.py", line 45, in <module>
        writer.add_scalar('TRAIN/l1_loss', criterionL1(model.real_B, model.fake_B), epoch)
    AttributeError: 'Pix2PixModel' object has no attribute 'real_B'

问题

yizhiwang96 commented 2 years ago

Your training dataset is empty (len(train_loader) is 0). All your following training was meaningless. Please check all files are correctly generated in ./data/vecfont_dataset_dirs_/ then train the model.