@ganxiaozhe23
Even a single song (e.g., 4 minutes long) is split into 2-second segments and perform inference in batches of test bsz (per GPU). Since parallelism is already applied at the segment level, if there are multiple files, they can be processed sequentially.
@ganxiaozhe23 Even a single song (e.g., 4 minutes long) is split into 2-second segments and perform inference in batches of test bsz (per GPU). Since parallelism is already applied at the segment level, if there are multiple files, they can be processed sequentially.