Closed RMSnow closed 8 months ago
The recipe should be updated to provide instructions for online feature extraction.
@lmxue Good advice. I plan to update the recipe in the future. This PR is to prepare a codebase for our recent internal research.
✨ Description
Support on-the-fly features extraction for the large-scale data preprocessing. Its strengths can be summarized as:
How to use?
Under the on-the-fly features extraction, the workflow for the future Amphion model is:
utt["Path"]
andutt["Duration"]
are the two key elements.Features Preprocess(No features preprocess any more!)preprocess.features_extraction_mode
asonline
[Task]OnlineDataset
and[Model]Trainer
Currently, I have supported DiffWaveNetSVC with on-the-fly features extraction. You can see the two main classes: SVCOnlineDataset and DiffusionTrainer.
👨💻 Main Changes
BaseDataset
andBaseCollator
toBaseOfflineDataset
andBaseOfflineDataset
andBaseOfflineCollator
BaseOnlineDataset
andBaseOnlineCollator
. The__getitem__
function will get the minimum elements (such as the raw waveform and its duration)audio_features_extractor.py
, I have integrated the common waveform features extraction operation (such as Mel Spectrogram, F0, Energy, and Semantic Features). Note that I have not integrated some vocoder requiring features. @VocodexElysiumtext_features_extractor.py
anddescriptive_text_features_extractor.py
for future TTS, TTA, and TTM's refactor/integration/supplement. @HeCheng0625 @lmxue @HarryHe11 @viewfinder-annnAmphion/config/[Task]/[Model].json
.✅ Checklist