ncsoft / avocodo

Official implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)
Other
149 stars 19 forks source link

Feature matching loss increases #7

Closed LEECHOONGHO closed 1 year ago

LEECHOONGHO commented 1 year ago

Hello, I'm training Avocodo Model with my own dataset consist of multiple datasets.

I touched some Generator's Parameter to change input and target sample rate. Generating 32kHz wave from 24kHz Mel. Hop size is 400.

When I train my avocodo model, Feature matching loss increases even Discriminator loss's descent stops. As an aside, strangely enough, Mel Loss's descent, and the quality of the audio output is pretty good.

Is it normal while train vocoder? Will the feature matching loss`s acendent ever stop?

avocodo training

We'd love to hear about your experiences.

Thank you.

HYPER PARAMETERS
model:
  upsample_rates: '[[5], [5], [4], [4]]'
  upsample_kernel_sizes: '[[11], [11], [8], [8]]'
  upsample_initial_channel: 384
  resblock_kernel_sizes: '[3,7,11]'
  resblock_dilation_sizes: '[[1,3,5], [1,3,5], [1,3,5]]'
  projection_filters: '[0, 1, 1, 1]'
  projection_kernels: '[0, 5, 7, 11]'
  combd_h_u: '[[16, 64, 256, 1024, 1024, 1024], [16, 64, 256, 1024, 1024, 1024], [16,
    64, 256, 1024, 1024, 1024]]'
  combd_d_k: '[[7, 11, 11, 11, 11, 5], [11, 21, 21, 21, 21, 5], [15, 41, 41, 41, 41,
    5]]'
  combd_d_s: '[[1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1], [1, 1, 4, 4, 4, 1]]'
  combd_d_d: '[[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]'
  combd_d_g: '[[1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256, 1], [1, 4, 16, 64, 256,
    1]]'
  combd_d_p: '[[3, 5, 5, 5, 5, 2], [5, 10, 10, 10, 10, 2], [7, 20, 20, 20, 20, 2]]'
  combd_op_f: '[1, 1, 1]'
  combd_op_k: '[3, 3, 3]'
  combd_op_g: '[1, 1, 1]'
  sbd_filters: '[[64, 128, 256, 256, 256],[64, 128, 256, 256, 256],[64, 128, 256,
    256, 256],[32, 64, 128, 128, 128]]'
  sbd_strides: '[[1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1], [1, 1, 3, 3, 1]]'
  sbd_kernel_sizes: '[        [[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7],[7, 7, 7]],        [[5,
    5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5]],        [[3, 3, 3],[3, 3, 3],[3,
    3, 3],[3, 3, 3],[3, 3, 3]],        [[5, 5, 5],[5, 5, 5],[5, 5, 5],[5, 5, 5],[5,
    5, 5]]    ]'
  sbd_dilations: '[        [[5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7, 11], [5, 7,
    11]],        [[3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7], [3, 5, 7]],        [[1,
    2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]],        [[1, 2, 3], [1, 2,
    3], [1, 2, 3], [2, 3, 5], [2, 3, 5]]    ]'
  sbd_band_ranges: '[[0, 6], [0, 11], [0, 16], [0, 64]]'
  sbd_transpose: '[False, False, False, True]'
  model_pqmf_config: '{        ''sbd'': [16, 256, 0.03, 10.0],        ''fsbd'': [64,
    256, 0.1, 9.0]    }'
  segment_size: 32000
  pqmf_config: '{        ''lv1'': [4, 192, 0.25, 10.0],        ''lv2'': [16, 256,
    0.03, 10.0]    }'