Open mmmmayi opened 4 years ago
I dunno- I know chnages in voxceleb data have been an issue but dont know if your data is the old or new one, or if this recipe is supposed to be up to date.
On Thu, Aug 13, 2020 at 11:33 PM Yi Ma notifications@github.com wrote:
Hello Daniel,
I tried to run SITW/v2 https://github.com/kaldi-asr/kaldi/tree/master/egs/sitw/v2 recipt with the latest VoxCeleb dataset. However, there is a bug error reported when I run run.sh: Cannot open directory: No such file or directory at local/ make_voxceleb1.pl line 56. I think it happened because of line 40 in run.sh, and seemsly a path of 'VoxCeleb1/voxceleb1_wav' should exists, but I only have 'VoxCeleb1/dev' and 'VoxCeleb1/test'. I think this because of updating of VoxCeleb. Could you please help me to update the script? Thank you
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4220, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO5357SWXHMQ73ZMZQDSAQBWFANCNFSM4P6SGG3Q .
I used the latest version. It works well with make_voxceleb1_v2.pl in voxceleb/v2
It would be great if you could figure out how to resolve the issue using that other data-prep script and make a PR so it works for the current voxceleb but can be made to work for the older release via a commented-out command in the run.sh.
On Fri, Aug 14, 2020 at 1:07 PM Yi Ma notifications@github.com wrote:
I used the latest version. It works well with make_voxceleb1_v2.pl in voxceleb/v2
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4220#issuecomment-673886087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO7DCRVN45NRVR5AEBTSATBCXANCNFSM4P6SGG3Q .
I wrote a make_voxceleb1_v2.pl for sitw according to egs/voxceleb:
if (@ARGV != 2) {
print STDERR "Usage: $0 <path-to-voxceleb1> <path-to-data-dir>\n";
print STDERR "e.g. $0 /export/voxceleb1 data/\n";
exit(1);
}
($data_base, $out_dir) = @ARGV;
my $out_dir = "$out_dir/voxceleb1";
if (system("mkdir -p $out_dir") != 0) {
die "Error making directory $out_dir";
}
# This file provides the list of speakers that overlap between SITW and VoxCeleb1.
if (! -e "$out_dir/voxceleb1_sitw_overlap.txt") {
system("wget -O $out_dir/voxceleb1_sitw_overlap.txt http://www.openslr.org/resources/49/voxceleb1_sitw_overlap.txt");
}
if (! -e "$data_base/vox1_meta.csv") {
system("wget -O $data_base/vox1_meta.csv http://www.openslr.org/resources/49/vox1_meta.csv");
}
# sitw_overlap contains the list of speakers that also exist in our evaluation set, SITW.
my %sitw_overlap = ();
open(OVERLAP, "<", "$out_dir/voxceleb1_sitw_overlap.txt") or die "Could not open the overlap file $out_dir/voxceleb1_sitw_overlap.txt";
while (<OVERLAP>) {
chomp;
my $spkr_id = $_;
$sitw_overlap{$spkr_id} = ();
}
close(OVERLAP) or die;
open(META_IN, "<", "$data_base/vox1_meta.csv") or die "Could not open the meta data file $data_base/vox1_meta.csv";
# Also add the banned speakers to sitw_overlap using their ID format in the
# newest version of VoxCeleb.
while (<META_IN>) {
chomp;
my ($vox_id, $spkr_id, $gender, $nation, $set) = split;
if (exists($sitw_overlap{$spkr_id})) {
$sitw_overlap{$vox_id} = ();
}
}
close(META_IN) or die;
opendir my $dh, "$data_base/wav" or die "Cannot open directory test: $!";
my @spkr_dirs = grep {-d "$data_base/wav/$_" && ! /^\.{1,2}$/} readdir($dh);
closedir $dh;
open(SPKR, ">", "$out_dir/utt2spk") or die "Could not open the output file $out_dir/utt2spk";
open(WAV, ">", "$out_dir/wav.scp") or die "Could not open the output file $out_dir/wav.scp";
foreach (@spkr_dirs) {
my $spkr_id = $_;
if (not exists $sitw_overlap{$spkr_id}) {
opendir my $dh, "$data_base/wav/$spkr_id/" or die "Cannot open directory: $!";
my @rec_dirs = grep {-d "$data_base/wav/$spkr_id/$_" && ! /^\.{1,2}$/} readdir($dh);
closedir $dh;
foreach (@rec_dirs) {
my $rec_id = $_;
opendir my $dh, "$data_base/wav/$spkr_id/$rec_id/" or die "Cannot open directory: $!";
my @files = map{s/\.[^.]+$//;$_}grep {/\.wav$/} readdir($dh);
closedir $dh;
foreach (@files) {
my $name = $_;
my $wav = "$data_base/wav/$spkr_id/$rec_id/$name.wav";
my $utt_id = "$spkr_id-$rec_id-$name";
print WAV "$utt_id", " $wav", "\n";
print SPKR "$utt_id", " $spkr_id", "\n";
}
}
}
}
close(SPKR) or die;
close(WAV) or die;
if (system(
"utils/utt2spk_to_spk2utt.pl $out_dir/utt2spk >$out_dir/spk2utt") != 0) {
die "Error creating spk2utt file in directory $out_dir";
}
system("env LC_COLLATE=C utils/fix_data_dir.sh $out_dir");
if (system("env LC_COLLATE=C utils/validate_data_dir.sh --no-text --no-feats $out_dir") != 0) {
die "Error validating directory $out_dir";
}
before run this script, you need to merge all of samples in dev/ and test/ into a file fold called wav
mm. is there any way you could make a PR from it? better if it's fully automatic.
On Fri, Aug 14, 2020 at 8:49 PM Yi Ma notifications@github.com wrote:
I wrote a make_voxceleb1_v2.pl for sitw according to egs/voxceleb:
if (@ARGV != 2) { print STDERR "Usage: $0
\n"; print STDERR "e.g. $0 /export/voxceleb1 data/\n"; exit(1); } ($data_base, $out_dir) = @ARGV; my $out_dir = "$out_dir/voxceleb1";
if (system("mkdir -p $out_dir") != 0) { die "Error making directory $out_dir"; }
This file provides the list of speakers that overlap between SITW and VoxCeleb1.
if (! -e "$out_dir/voxceleb1_sitw_overlap.txt") { system("wget -O $out_dir/voxceleb1_sitw_overlap.txt http://www.openslr.org/resources/49/voxceleb1_sitw_overlap.txt"); }
if (! -e "$data_base/vox1_meta.csv") { system("wget -O $data_base/vox1_meta.csv http://www.openslr.org/resources/49/vox1_meta.csv"); }
sitw_overlap contains the list of speakers that also exist in our evaluation set, SITW.
my %sitw_overlap = (); open(OVERLAP, "<", "$out_dir/voxceleb1_sitw_overlap.txt") or die "Could not open the overlap file $out_dir/voxceleb1_sitw_overlap.txt"; while (
) { chomp; my $spkrid = $; $sitw_overlap{$spkr_id} = (); } close(OVERLAP) or die; open(META_IN, "<", "$data_base/vox1_meta.csv") or die "Could not open the meta data file $data_base/vox1_meta.csv";
Also add the banned speakers to sitw_overlap using their ID format in the
newest version of VoxCeleb.
while (
) { chomp; my ($vox_id, $spkr_id, $gender, $nation, $set) = split; if (exists($sitw_overlap{$spkr_id})) { $sitw_overlap{$vox_id} = (); } } close(META_IN) or die; opendir my $dh, "$data_base/wav" or die "Cannot open directory test: $!"; my @spkr_dirs = grep {-d "$database/wav/$" && ! /^.{1,2}$/} readdir($dh); closedir $dh;
open(SPKR, ">", "$out_dir/utt2spk") or die "Could not open the output file $out_dir/utt2spk"; open(WAV, ">", "$out_dir/wav.scp") or die "Could not open the output file $out_dir/wav.scp";
foreach (@spkr_dirs) { my $spkrid = $; if (not exists $sitw_overlap{$spkr_id}) { opendir my $dh, "$data_base/wav/$spkr_id/" or die "Cannot open directory: $!"; my @rec_dirs = grep {-d "$data_base/wav/$spkrid/$" && ! /^.{1,2}$/} readdir($dh); closedir $dh; foreach (@rec_dirs) { my $recid = $; opendir my $dh, "$data_base/wav/$spkr_id/$recid/" or die "Cannot open directory: $!"; my @files = map{s/.[^.]+$//;$}grep {/.wav$/} readdir($dh); closedir $dh; foreach (@files) { my $name = $_; my $wav = "$data_base/wav/$spkr_id/$rec_id/$name.wav"; my $utt_id = "$spkr_id-$rec_id-$name"; print WAV "$utt_id", " $wav", "\n"; print SPKR "$utt_id", " $spkr_id", "\n"; } } } }
close(SPKR) or die; close(WAV) or die;
if (system( "utils/utt2spk_to_spk2utt.pl $out_dir/utt2spk >$out_dir/spk2utt") != 0) { die "Error creating spk2utt file in directory $out_dir"; } system("env LC_COLLATE=C utils/fix_data_dir.sh $out_dir"); if (system("env LC_COLLATE=C utils/validate_data_dir.sh --no-text --no-feats $out_dir") != 0) { die "Error validating directory $out_dir"; }
before run this script, you need to merge all of samples in dev/ and test/ into a file fold called wav
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4220#issuecomment-674058040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO2BPIH5N43DDB7NAYTSAUXFJANCNFSM4P6SGG3Q .
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.
Hello Daniel,
I tried to run SITW/v2 recipt with the latest VoxCeleb dataset. However, there is a bug error reported when I run run.sh:
Cannot open directory: No such file or directory at local/make_voxceleb1.pl line 56.
I think it happened because of line 40 in run.sh, and seemsly a path of 'VoxCeleb1/voxceleb1_wav' should exists, but I only have 'VoxCeleb1/dev' and 'VoxCeleb1/test'. I think this because of updating of VoxCeleb. Could you please help me to update the script? Thank you