Closed hrukalive closed 1 year ago
I formerly wrote a Python program calling git grep
to search for config keys that are never used in other codes: search_config2.txt (renamed to .txt because GitHub does not support uploading .py files). You can use it to clean unused config keys after refactoring.
This PR is ready to be merged after the final fixes and some simple tests on the incoming branch.
By the way, license of the refactor-v2 branch was formerly changed to Apache 2.0, which will be the new license of our forked DiffSinger once refactor-v2 is merged into the main branch. With your agreement, your contributions will also be licensed under Apache 2.0 in this repository.
My consent, thanks.
Due to some unresolved performance issues during tests, this branch will be merged into a temporary branch. It should be merged into the main branch after these issues are addressed.
Performance is tightly linked to the grid resolution when performing shuffling and sorting by similar lengths on samples. When fully sorted, the performance does not drop compared to the original codebase.
Performance issues addressed so I changed the base branch back to refactor-v2.
base_task
和acoustic_task
进行了适配。'bf16'
。Sampler
类的子类。rank_zero
工具来甄别主进程scripts/train.py
中,环境变量TORCH_CUDNN_V8_API_ENABLED
的设置是防止在使用 16 位精度时过慢,如果它导致任何问题,请尝试将其注释掉。)新参数说明:
pl_trainer_accelerator
、pl_trainer_devices
、pl_trainer_num_nodes
、pl_trainer_strategy
和pl_trainer_precision
请参阅 PyTorch Lightning 2.0 文档以了解它们在Trainer
中的用法。pl_trainer_devices
,它可以是:pl_trainer_devices: 'auto'
: 自动选择pl_trainer_devices: 2
: 使用两个加速器,自动选择pl_trainer_devices: [2, 3]
: 使用 2 号和 3 号加速器ddp_backend
, 可选项有'gloo'
,'nccl'
, 或'nccl_no_p2p'
。config/base.yaml
中的sampler_frame_count_grid
:现在正确支持具有相似大小的样本的随机混洗。 首先,样本长度被归一为sampler_frame_count_grid
的倍数(默认为 6),然后在每个 bin 内,样本在每个 epoch 进行洗牌。config/base.yaml
中的dataloader_prefetch_factor
:PyTorch 的 DataLoader 设置(请参阅 PyTorch 文档)。max_tokens
和max_sentences
始终控制单个设备的Batch大小。 然后将有效批量大小全部相加。accumulate_grad_batches
允许您在梯度下降之前反向传播多个批次,从而增加有效批次大小。max_sentences=8
,accumulate_grad_batches=2
。 那么有效批量大小4*8*2=64
。base_task
andacoustic_task
to it.'bf16'
.Sampler
class.rank_zero
utility.scripts/train.py
, environment variableTORCH_CUDNN_V8_API_ENABLED
is set to prevent excessive slowdown when using 16-bit precision. If it causes any problem, try to comment it out.)New parameter explanation:
pl_trainer_accelerator
,pl_trainer_devices
,pl_trainer_num_nodes
,pl_trainer_strategy
, andpl_trainer_precision
, see the PyTorch Lightning 2.0 doc for their usage inTrainer
section.pl_trainer_devices
, it can be:pl_trainer_devices: 'auto'
: Auto selectpl_trainer_devices: 2
: Use two accelerators, auto selectpl_trainer_devices: [2, 3]
: Use accelerator number 2 and 3ddp_backend
, choose from'gloo'
,'nccl'
, or'nccl_no_p2p'
.sampler_frame_count_grid
inconfig/base.yaml
: Now random shuffling of samples with similar sizes is correctly supported. First, the sample length is rounded to multiples ofsampler_frame_count_grid
(default 6), and then within each bin, samples are shuffled every epoch.dataloader_prefetch_factor
inconfig/base.yaml
: Setting for PyTorch's DataLoader (refer to PyTorch doc).max_tokens
andmax_sentences
always control batching for a single device. Effective batch size is then all of them summed.accumulate_grad_batches
allows you to backpropagate multiple batches before a gradient descent, effectively increasing the batch size.max_sentences=8
,accumulate_grad_batches=2
. Then you have effective batch size4*8*2=64
.