Open CryptPixel opened 9 months ago
HNM model frames take up most of the space in VOCALOID voicepack, and it looks like only the frames contained in the segmented parts will be packaged in ddb, and other frames (frames of Sil) will be ignored. I don't think shortening WAVs will save much space (roughly 10-20MB), and modifying WAVs after Devkit has already generated HNM frames may cause unknown problems. You could try to shorten the segmentations of each phonemes manually.
Probably out of context, but what are the HNM model frames? Are these Harmonic + Noise models that try to recreate the different wav samples? And how are they being used in Vocaloid?
Officially published Vocaloid DBs generally have their frames shortened to only what is actually used within the boundaries of their segmentations, greatly reducing size.
The V3 devkit we currently have has the "shorten articulation" feature, which I assume is related to this, disabled. The result is that DBs are very large, even after packing into .ddb, due to a significant amount of unused junk.
Would it be possible to implement this kind of trimming into the ddb packing script instead?