Open rameshkunasi opened 4 years ago
sorry for the late reply 😀
Actually, the memory we used has little correlation with the number of training samples. You can get the within and between covariance matrix by batch operation.
Regards, Yours
From my iPhone
在 2019年11月25日,21:06,Kunasi Ramesh notifications@github.com 写道:
Hi,
I am using a 200K utterance to train LDA. While training LDA CPU RAM getting full and the process was killed. My CPU RAM is 8GB & 2GB swap memory. How to train LDA with a large amount of data?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Thank you for your reply 👍 Below are my answers to your questions.
I have trained LDA using LDA.py for 200K training samples. I have saved self.scalings_ into kaldi format as transform.mat. Can you please suggest me how can I use transform.mat to train PLDA in kaldi using ivector-compute-plda? Or How to train PLDA in Python, If you have any script please provide to me.
Thanks K.Ramesh
While training using LDA.py I got the error. You are using only eigenvectors to train the LDA matrix. I have not seen SVD implementation in LDA.py
My code is a revision of the above one. The original code has the option of SVD.
Yours sincerely,
He Liang,
Rohm Building 8101,
Department of Electronic Engineering, Tsinghua University,
Beijing, 10084, China
发件人: noreply@github.com noreply@github.com 代表 Kunasi Ramesh 发送时间: Sunday, December 1, 2019 10:56 PM 收件人: sanphiee/LPLDA LPLDA@noreply.github.com 抄送: He Liang heliang@mail.tsinghua.edu.cn; Comment comment@noreply.github.com 主题: Re: [sanphiee/LPLDA] Process is getting killed (#1)
Thank you for your reply 👍 Below are my answers to your questions.
I have trained LDA using LDA.py for 200K training samples. I have saved self.scalings_ into kaldi format as transform.mat. Can you please suggest me how can I use transform.mat to train PLDA in kaldi using ivector-compute-plda? Or How to train PLDA in Python, If you have any script please provide to me.
Thanks K.Ramesh
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sanphiee/LPLDA/issues/1?email_source=notifications&email_token=AFU2POLRF6ZER4NKMXO3IMTQWPF7VA5CNFSM4JRIT2GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFRLOSA#issuecomment-560117576 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AFU2POKBK7JGGFUXYD26SMTQWPF7VANCNFSM4JRIT2GA .
from future import print_function from future import division
import numpy as np import sys, os, re, gzip, struct
#################################################
if not 'KALDI_ROOT' in os.environ:
os.environ['KALDI_ROOT']='/mnt/matylda5/iveselyk/Tools/kaldi-trunk'
path = os.popen('echo $KALDI_ROOT/src/bin:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/src/fstbin/:$KALDI_ROOT/src/gmmbin/:$KALDI_ROOT/src/featbin/:$KALDI_ROOT/src/lm/:$KALDI_ROOT/src/sgmmbin/:$KALDI_ROOT/src/sgmm2bin/:$KALDI_ROOT/src/fgmmbin/:$KALDI_ROOT/src/latbin/:$KALDI_ROOT/src/nnetbin:$KALDI_ROOT/src/nnet2bin:$KALDI_ROOT/src/nnet3bin:$KALDI_ROOT/src/online2bin/:$KALDI_ROOT/src/ivectorbin/:$KALDI_ROOT/src/lmbin/') os.environ['PATH'] = path.readline().strip() + ':' + os.environ['PATH'] path.close()
#################################################
class UnsupportedDataType(Exception): pass class UnknownVectorHeader(Exception): pass class UnknownMatrixHeader(Exception): pass
class BadSampleSize(Exception): pass class BadInputFormat(Exception): pass
class SubprocessFailed(Exception): pass
#################################################
def open_or_fd(file, mode='rb'): """ fd = open_or_fd(file) Open file, gzipped file, pipe, or forward the file-descriptor. Eventually seeks in the 'file' argument contains ':offset' suffix. """ offset = None try:
if re.search('^(ark|scp)(,scp|,b|,t|,n?f|,n?p|,b?o|,n?s|,n?cs)*:', file):
(prefix,file) = file.split(':',1)
# separate offset from filename (optional),
if re.search(':[0-9]+$', file):
(file,offset) = file.rsplit(':',1)
# input pipe?
if file[-1] == '|':
fd = popen(file[:-1], 'rb') # custom,
# output pipe?
elif file[0] == '|':
fd = popen(file[1:], 'wb') # custom,
# is it gzipped?
elif file.split('.')[-1] == 'gz':
fd = gzip.open(file, mode)
# a normal file...
else:
fd = open(file, mode)
except TypeError:
# 'file' is opened file descriptor,
fd = file
# Eventually seek to offset,
if offset != None: fd.seek(int(offset))
return fd
def popen(cmd, mode="rb"): if not isinstance(cmd, str): raise TypeError("invalid cmd type (%s, expected string)" % type(cmd))
import subprocess, io, threading
# cleanup function for subprocesses,
def cleanup(proc, cmd):
ret = proc.wait()
if ret > 0:
raise SubprocessFailed('cmd %s returned %d !' % (cmd,ret))
return
# text-mode,
if mode == "r":
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=sys.stderr)
threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
return io.TextIOWrapper(proc.stdout)
elif mode == "w":
proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE, stderr=sys.stderr)
threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
return io.TextIOWrapper(proc.stdin)
# binary,
elif mode == "rb":
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=sys.stderr)
threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
return proc.stdout
elif mode == "wb":
proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE, stderr=sys.stderr)
threading.Thread(target=cleanup,args=(proc,cmd)).start() # clean-up thread,
return proc.stdin
# sanity,
else:
raise ValueError("invalid mode %s" % mode)
def read_key(fd): """ [key] = read_key(fd) Read the utterance-key from the opened ark/stream descriptor 'fd'. """ assert('b' in fd.mode), "Error: 'fd' was opened in text mode (in python3 use sys.stdin.buffer)"
key = ''
while 1:
char = fd.read(1).decode("latin1")
if char == '' : break
if char == ' ' : break
key += char
key = key.strip()
if key == '': return None # end of file,
assert(re.match('^\S+$',key) != None) # check format (no whitespace!)
return key
#################################################
def read_ali_ark(file_or_fd): """ Alias to 'read_vec_int_ark()' """ return read_vec_int_ark(file_or_fd)
def read_vec_int_ark(file_or_fd):
""" generator(key,vec) = read_vec_int_ark(file_or_fd)
Create generator of (key,vector
Read ark to a 'dictionary':
d = { u:d for u,d in kaldi_io.read_vec_int_ark(file) }
"""
fd = open_or_fd(file_or_fd)
try:
key = read_key(fd)
while key:
ali = read_vec_int(fd)
yield key, ali
key = read_key(fd)
finally:
if fd is not file_or_fd: fd.close()
def read_vec_int(file_or_fd): """ [int-vec] = read_vec_int(file_or_fd) Read kaldi integer vector, ascii or binary input, """ fd = open_or_fd(file_or_fd) binary = fd.read(2).decode() if binary == '\0B': # binary flag assert(fd.read(1).decode() == '\4'); # int-size vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # vector dim if vec_size == 0: return np.array([], dtype='int32')
vec = np.frombuffer(fd.read(vec_size*5), dtype=[('size','int8'),('value','int32')], count=vec_size)
assert(vec[0]['size'] == 4) # int32 size,
ans = vec[:]['value'] # values are in 2nd column,
else: # ascii,
arr = (binary + fd.readline().decode()).strip().split()
try:
arr.remove('['); arr.remove(']') # optionally
except ValueError:
pass
ans = np.array(arr, dtype=int)
if fd is not file_or_fd : fd.close() # cleanup
return ans
def write_vec_int(file_or_fd, v, key=''): """ write_vec_int(f, v, key='') Write a binary kaldi integer vector to filename or stream. Arguments: file_or_fd : filename or opened file descriptor for writing, v : the vector to be stored, key (optional) : used for writing ark-file, the utterance-id gets written before the vector.
Example of writing single vector:
kaldi_io.write_vec_int(filename, vec)
Example of writing arkfile:
with open(ark_file,'w') as f:
for key,vec in dict.iteritems():
kaldi_io.write_vec_flt(f, vec, key=key)
"""
assert(isinstance(v, np.ndarray))
assert(v.dtype == np.int32)
fd = open_or_fd(file_or_fd, mode='wb')
if sys.version_info[0] == 3: assert(fd.mode == 'wb')
try:
if key != '' : fd.write((key+' ').encode("latin1")) # ark-files have keys (utterance-id),
fd.write('\0B'.encode()) # we write binary!
# dim,
fd.write('\4'.encode()) # int32 type,
fd.write(struct.pack(np.dtype('int32').char, v.shape[0]))
# data,
for i in range(len(v)):
fd.write('\4'.encode()) # int32 type,
fd.write(struct.pack(np.dtype('int32').char, v[i])) # binary,
finally:
if fd is not file_or_fd : fd.close()
#################################################
def read_vec_flt_scp(file_or_fd): """ generator(key,mat) = read_vec_flt_scp(file_or_fd) Returns generator of (key,vector) tuples, read according to kaldi scp. file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
Iterate the scp:
for key,vec in kaldi_io.read_vec_flt_scp(file):
...
Read scp to a 'dictionary':
d = { key:mat for key,mat in kaldi_io.read_mat_scp(file) }
"""
fd = open_or_fd(file_or_fd)
try:
for line in fd:
(key,rxfile) = line.decode().split(' ')
vec = read_vec_flt(rxfile)
yield key, vec
finally:
if fd is not file_or_fd : fd.close()
def read_vec_flt_ark(file_or_fd):
""" generator(key,vec) = read_vec_flt_ark(file_or_fd)
Create generator of (key,vector
Read ark to a 'dictionary':
d = { u:d for u,d in kaldi_io.read_vec_flt_ark(file) }
"""
fd = open_or_fd(file_or_fd)
try:
key = read_key(fd)
while key:
ali = read_vec_flt(fd)
yield key, ali
key = read_key(fd)
finally:
if fd is not file_or_fd : fd.close()
def read_vec_flt(file_or_fd): """ [flt-vec] = read_vec_flt(file_or_fd) Read kaldi float vector, ascii or binary input, """ fd = open_or_fd(file_or_fd) binary = fd.read(2).decode() if binary == '\0B': # binary flag ans = _read_vec_flt_binary(fd) else: # ascii, arr = (binary + fd.readline().decode()).strip().split() try: arr.remove('['); arr.remove(']') # optionally except ValueError: pass ans = np.array(arr, dtype=float) if fd is not file_or_fd : fd.close() # cleanup return ans
def _read_vec_flt_binary(fd): header = fd.read(3).decode() if header == 'FV ' : sample_size = 4 # floats elif header == 'DV ' : sample_size = 8 # doubles else : raise UnknownVectorHeader("The header contained '%s'" % header) assert (sample_size > 0)
assert (fd.read(1).decode() == '\4'); # int-size
vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # vector dim
if vec_size == 0:
return np.array([], dtype='float32')
# Read whole vector,
buf = fd.read(vec_size * sample_size)
if sample_size == 4 : ans = np.frombuffer(buf, dtype='float32')
elif sample_size == 8 : ans = np.frombuffer(buf, dtype='float64')
else : raise BadSampleSize
return ans
def write_vec_flt(file_or_fd, v, key=''): """ write_vec_flt(f, v, key='') Write a binary kaldi vector to filename or stream. Supports 32bit and 64bit floats. Arguments: file_or_fd : filename or opened file descriptor for writing, v : the vector to be stored, key (optional) : used for writing ark-file, the utterance-id gets written before the vector.
Example of writing single vector:
kaldi_io.write_vec_flt(filename, vec)
Example of writing arkfile:
with open(ark_file,'w') as f:
for key,vec in dict.iteritems():
kaldi_io.write_vec_flt(f, vec, key=key)
"""
assert(isinstance(v, np.ndarray))
fd = open_or_fd(file_or_fd, mode='wb')
if sys.version_info[0] == 3: assert(fd.mode == 'wb')
try:
if key != '' : fd.write((key+' ').encode("latin1")) # ark-files have keys (utterance-id),
fd.write('\0B'.encode()) # we write binary!
# Data-type,
if v.dtype == 'float32': fd.write('FV '.encode())
elif v.dtype == 'float64': fd.write('DV '.encode())
else: raise UnsupportedDataType("'%s', please use 'float32' or 'float64'" % v.dtype)
# Dim,
fd.write('\04'.encode())
fd.write(struct.pack(np.dtype('uint32').char, v.shape[0])) # dim
# Data,
fd.write(v.tobytes())
finally:
if fd is not file_or_fd : fd.close()
#################################################
def read_mat_scp(file_or_fd): """ generator(key,mat) = read_mat_scp(file_or_fd) Returns generator of (key,matrix) tuples, read according to kaldi scp. file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
Iterate the scp:
for key,mat in kaldi_io.read_mat_scp(file):
...
Read scp to a 'dictionary':
d = { key:mat for key,mat in kaldi_io.read_mat_scp(file) }
"""
fd = open_or_fd(file_or_fd)
try:
for line in fd:
(key,rxfile) = line.decode().split(' ')
mat = read_mat(rxfile)
yield key, mat
finally:
if fd is not file_or_fd : fd.close()
def read_mat_ark(file_or_fd): """ generator(key,mat) = read_mat_ark(file_or_fd) Returns generator of (key,matrix) tuples, read from ark file/stream. file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
Iterate the ark:
for key,mat in kaldi_io.read_mat_ark(file):
...
Read ark to a 'dictionary':
d = { key:mat for key,mat in kaldi_io.read_mat_ark(file) }
"""
fd = open_or_fd(file_or_fd)
try:
key = read_key(fd)
while key:
mat = read_mat(fd)
yield key, mat
key = read_key(fd)
finally:
if fd is not file_or_fd : fd.close()
def read_mat(file_or_fd): """ [mat] = read_mat(file_or_fd) Reads single kaldi matrix, supports ascii and binary. file_or_fd : file, gzipped file, pipe or opened file descriptor. """ fd = open_or_fd(file_or_fd) try: binary = fd.read(2).decode() if binary == '\0B' : mat = _read_mat_binary(fd) else: assert(binary == ' [') mat = _read_mat_ascii(fd) finally: if fd is not file_or_fd: fd.close() return mat
def _read_mat_binary(fd):
header = fd.read(3).decode()
# 'CM', 'CM2', 'CM3' are possible values,
if header.startswith('CM'): return _read_compressed_mat(fd, header)
elif header == 'FM ': sample_size = 4 # floats
elif header == 'DM ': sample_size = 8 # doubles
else: raise UnknownMatrixHeader("The header contained '%s'" % header)
assert(sample_size > 0)
# Dimensions
s1, rows, s2, cols = np.frombuffer(fd.read(10), dtype='int8,int32,int8,int32', count=1)[0]
# Read whole matrix
buf = fd.read(rows * cols * sample_size)
if sample_size == 4 : vec = np.frombuffer(buf, dtype='float32')
elif sample_size == 8 : vec = np.frombuffer(buf, dtype='float64')
else : raise BadSampleSize
mat = np.reshape(vec,(rows,cols))
return mat
def _read_mat_ascii(fd): rows = [] while 1: line = fd.readline().decode() if (len(line) == 0) : raise BadInputFormat # eof, should not happen! if len(line.strip()) == 0 : continue # skip empty line arr = line.strip().split() if arr[-1] != ']': rows.append(np.array(arr,dtype='float32')) # not last line else: rows.append(np.array(arr[:-1],dtype='float32')) # last line mat = np.vstack(rows) return mat
def _read_compressed_mat(fd, format): """ Read a compressed matrix, see: https://github.com/kaldi-asr/kaldi/blob/master/src/matrix/compressed-matrix.h methods: CompressedMatrix::Read(...), CompressedMatrix::CopyToMat(...), """ assert(format == 'CM ') # The formats CM2, CM3 are not supported...
# Format of header 'struct',
global_header = np.dtype([('minvalue','float32'),('range','float32'),('num_rows','int32'),('num_cols','int32')]) # member '.format' is not written,
per_col_header = np.dtype([('percentile_0','uint16'),('percentile_25','uint16'),('percentile_75','uint16'),('percentile_100','uint16')])
# Read global header,
globmin, globrange, rows, cols = np.frombuffer(fd.read(16), dtype=global_header, count=1)[0]
# The data is structed as [Colheader, ... , Colheader, Data, Data , .... ]
# { cols }{ size }
col_headers = np.frombuffer(fd.read(cols*8), dtype=per_col_header, count=cols)
col_headers = np.array([np.array([x for x in y]) * globrange * 1.52590218966964e-05 + globmin for y in col_headers], dtype=np.float32)
data = np.reshape(np.frombuffer(fd.read(cols*rows), dtype='uint8', count=cols*rows), newshape=(cols,rows)) # stored as col-major,
mat = np.zeros((cols,rows), dtype='float32')
p0 = col_headers[:, 0].reshape(-1, 1)
p25 = col_headers[:, 1].reshape(-1, 1)
p75 = col_headers[:, 2].reshape(-1, 1)
p100 = col_headers[:, 3].reshape(-1, 1)
mask_0_64 = (data <= 64)
mask_193_255 = (data > 192)
mask_65_192 = (~(mask_0_64 | mask_193_255))
mat += (p0 + (p25 - p0) / 64. * data) * mask_0_64.astype(np.float32)
mat += (p25 + (p75 - p25) / 128. * (data - 64)) * mask_65_192.astype(np.float32)
mat += (p75 + (p100 - p75) / 63. * (data - 192)) * mask_193_255.astype(np.float32)
return mat.T # transpose! col-major -> row-major,
def write_mat(file_or_fd, m, key=''): """ write_mat(f, m, key='') Write a binary kaldi matrix to filename or stream. Supports 32bit and 64bit floats. Arguments: file_or_fd : filename of opened file descriptor for writing, m : the matrix to be stored, key (optional) : used for writing ark-file, the utterance-id gets written before the matrix.
Example of writing single matrix:
kaldi_io.write_mat(filename, mat)
Example of writing arkfile:
with open(ark_file,'w') as f:
for key,mat in dict.iteritems():
kaldi_io.write_mat(f, mat, key=key)
"""
assert(isinstance(m, np.ndarray))
assert(len(m.shape) == 2), "'m' has to be 2d matrix!"
fd = open_or_fd(file_or_fd, mode='wb')
if sys.version_info[0] == 3: assert(fd.mode == 'wb')
try:
if key != '' : fd.write((key+' ').encode("latin1")) # ark-files have keys (utterance-id),
fd.write('\0B'.encode()) # we write binary!
# Data-type,
if m.dtype == 'float32': fd.write('FM '.encode())
elif m.dtype == 'float64': fd.write('DM '.encode())
else: raise UnsupportedDataType("'%s', please use 'float32' or 'float64'" % m.dtype)
# Dims,
fd.write('\04'.encode())
fd.write(struct.pack(np.dtype('uint32').char, m.shape[0])) # rows
fd.write('\04'.encode())
fd.write(struct.pack(np.dtype('uint32').char, m.shape[1])) # cols
# Data,
fd.write(m.tobytes())
finally:
if fd is not file_or_fd : fd.close()
#################################################
#
def read_cnet_ark(file_or_fd): """ Alias of function 'read_post_ark()', 'cnet' = confusion network """ return read_post_ark(file_or_fd)
def read_postrxspec(file): """ adaptor to read both 'ark:...' and 'scp:...' inputs of posteriors, """ if file_.startswith("ark:"): return read_postark(file) elif file_.startswith("scp:"): return read_postscp(file) else: print("unsupported intput type: %s" % file_) print("it should begint with 'ark:' or 'scp:'") sys.exit(1)
def read_post_scp(file_or_fd): """ generator(key,post) = read_post_scp(file_or_fd) Returns generator of (key,post) tuples, read according to kaldi scp. file_or_fd : scp, gzipped scp, pipe or opened file descriptor.
Iterate the scp:
for key,post in kaldi_io.read_post_scp(file):
...
Read scp to a 'dictionary':
d = { key:post for key,post in kaldi_io.read_post_scp(file) }
"""
fd = open_or_fd(file_or_fd)
try:
for line in fd:
(key,rxfile) = line.decode().split(' ')
post = read_post(rxfile)
yield key, post
finally:
if fd is not file_or_fd : fd.close()
def read_post_ark(file_or_fd): """ generator(key,vec<vec<int,float>>) = read_post_ark(file) Returns generator of (key,posterior) tuples, read from ark file. file_or_fd : ark, gzipped ark, pipe or opened file descriptor.
Iterate the ark:
for key,post in kaldi_io.read_post_ark(file):
...
Read ark to a 'dictionary':
d = { key:post for key,post in kaldi_io.read_post_ark(file) }
"""
fd = open_or_fd(file_or_fd)
try:
key = read_key(fd)
while key:
post = read_post(fd)
yield key, post
key = read_key(fd)
finally:
if fd is not file_or_fd: fd.close()
def read_post(file_or_fd): """ [post] = read_post(file_or_fd) Reads single kaldi 'Posterior' in binary format.
The 'Posterior' is C++ type 'vector<vector<tuple<int,float> > >',
the outer-vector is usually time axis, inner-vector are the records
at given time, and the tuple is composed of an 'index' (integer)
and a 'float-value'. The 'float-value' can represent a probability
or any other numeric value.
Returns vector of vectors of tuples.
"""
fd = open_or_fd(file_or_fd)
ans=[]
binary = fd.read(2).decode(); assert(binary == '\0B'); # binary flag
assert(fd.read(1).decode() == '\4'); # int-size
outer_vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # number of frames (or bins)
# Loop over 'outer-vector',
for i in range(outer_vec_size):
assert(fd.read(1).decode() == '\4'); # int-size
inner_vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # number of records for frame (or bin)
data = np.frombuffer(fd.read(inner_vec_size*10), dtype=[('size_idx','int8'),('idx','int32'),('size_post','int8'),('post','float32')], count=inner_vec_size)
assert(data[0]['size_idx'] == 4)
assert(data[0]['size_post'] == 4)
ans.append(data[['idx','post']].tolist())
if fd is not file_or_fd: fd.close()
return ans
#################################################
#
def read_cntime_ark(file_or_fd): """ generator(key,vec<tuple<float,float>>) = read_cntime_ark(file_or_fd) Returns generator of (key,cntime) tuples, read from ark file. file_or_fd : file, gzipped file, pipe or opened file descriptor.
Iterate the ark:
for key,time in kaldi_io.read_cntime_ark(file):
...
Read ark to a 'dictionary':
d = { key:time for key,time in kaldi_io.read_post_ark(file) }
"""
fd = open_or_fd(file_or_fd)
try:
key = read_key(fd)
while key:
cntime = read_cntime(fd)
yield key, cntime
key = read_key(fd)
finally:
if fd is not file_or_fd : fd.close()
def read_cntime(file_or_fd): """ [cntime] = read_cntime(file_or_fd) Reads single kaldi 'Confusion Network time info', in binary format: C++ type: vector<tuple<float,float> >. (begin/end times of bins at the confusion network).
Binary layout is '<num-bins> <beg1> <end1> <beg2> <end2> ...'
file_or_fd : file, gzipped file, pipe or opened file descriptor.
Returns vector of tuples.
"""
fd = open_or_fd(file_or_fd)
binary = fd.read(2).decode(); assert(binary == '\0B'); # assuming it's binary
assert(fd.read(1).decode() == '\4'); # int-size
vec_size = np.frombuffer(fd.read(4), dtype='int32', count=1)[0] # number of frames (or bins)
data = np.frombuffer(fd.read(vec_size*10), dtype=[('size_beg','int8'),('t_beg','float32'),('size_end','int8'),('t_end','float32')], count=vec_size)
assert(data[0]['size_beg'] == 4)
assert(data[0]['size_end'] == 4)
ans = data[['t_beg','t_end']].tolist() # Return vector of tuples (t_beg,t_end),
if fd is not file_or_fd : fd.close()
return ans
#################################################
#
def read_segments_as_bool_vec(segments_file):
""" [ bool_vec ] = read_segments_as_bool_vec(segments_file)
using kaldi 'segments' file for 1 wav, format : '
assert(len(segs) > 0) # empty segmentation is an error, assert(len(np.unique([rec[1] for rec in segs ])) == 1) # segments with only 1 wav-file,
start = np.rint([100 rec[2] for rec in segs]).astype(int) end = np.rint([100 rec[3] for rec in segs]).astype(int)
frms = np.repeat(np.r[np.tile([False,True], len(end)), False], np.r[np.c[start - np.r[0, end[:-1]], end-start].flat, 0]) assert np.sum(end-start) == np.sum(frms) return frms
from future import print_function import numpy as np from scipy import linalg from sklearn.utils.multiclass import unique_labels from sklearn.utils import check_array, check_X_y from sklearn.utils.validation import check_is_fitted
all = ['LinearDiscriminantAnalysis']
X : array-like, shape (n_samples, n_features)
Input data.
Returns
-------
s : array, shape (n_features, n_features)
Estimated covariance matrix.
"""
s = np.cov(X, rowvar=0, bias = 1)
return s
X : array-like, shape (n_samples, n_features)
Input data.
y : array-like, shape (n_samples,) or (n_samples, n_targets)
Target values.
Returns
-------
means : array-like, shape (n_features,)
Class means.
"""
means = []
classes = np.unique(y)
for group in classes:
Xg = X[y == group, :]
means.append(Xg.mean(0))
return np.asarray(means)
X : array-like, shape (n_samples, n_features)
Input data.
y : array-like, shape (n_samples,) or (n_samples, n_targets)
Target values.
shrinkage : string or float, optional
Shrinkage parameter, possible values:
- None: no shrinkage (default).
- 'auto': automatic shrinkage using the Ledoit-Wolf lemma.
- float between 0 and 1: fixed shrinkage parameter.
Returns
-------
cov : array-like, shape (n_features, n_features)
Class covariance matrix.
"""
classes = np.unique(y)
covs = []
for group in classes:
Xg = X[y == group, :]
covs.append(np.atleast_2d(_cov(Xg)))
return np.average(covs, axis=0)
class LinearDiscriminantAnalysis:
def __init__(self, n_components=None, within_between_ratio=10.0,
nearest_neighbor_ratio=1.2):
self.n_components = n_components
self.within_between_ratio = within_between_ratio
self.nearest_neighbor_ratio = nearest_neighbor_ratio
def _solve_eigen(self, X, y):
"""Eigenvalue solver.
The eigenvalue solver computes the optimal solution of the Rayleigh
coefficient (basically the ratio of between class scatter to within
class scatter). This solver supports both classification and
dimensionality reduction (with optional shrinkage).
Parameters
----------
X : array-like, shape (n_samples, n_features)
Training data.
y : array-like, shape (n_samples,) or (n_samples, n_targets)
Target values.
Notes
-----
This solver is based on [1]_, section 3.8.3, pp. 121-124.
References
----------
.. [1] R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification
(Second Edition). John Wiley & Sons, Inc., New York, 2001. ISBN
0-471-05669-3.
"""
self.means_ = _class_means(X, y)
self.covariance_ = _class_cov(X, y)
Sw = self.covariance_ # within scatter
St = _cov(X) # total scatter
Sb = St - Sw # between scatter
evals, evecs = linalg.eigh(Sb, Sw)
evecs = evecs[:, np.argsort(evals)[::-1]] # sort eigenvectors
self.scalings_ = np.asarray(evecs)
def fit(self, X, y):
"""Fit Local Pairwise Trained Linear Discriminant Analysis
model according to the given training data and parameters.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Training data.
y : array, shape (n_samples,)
Target values.
"""
X, y = check_X_y(np.asarray(X), np.asarray(y.reshape(-1)), ensure_min_samples=2)
self.classes_ = unique_labels(y)
# Get the maximum number of components
if self.n_components is None:
self.n_components = len(self.classes_) - 1
else:
self.n_components = min(len(self.classes_) - 1, self.n_components)
self._solve_eigen(np.asarray(X), np.asarray(y))
return self
def transform(self, X):
"""Project data to maximize class separation.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Input data.
Returns
-------
X_new : array, shape (n_samples, n_components)
Transformed data.
"""
check_is_fitted(self, ['scalings_'], all_or_any=any)
X = check_array(X)
X_new = np.dot(X, self.scalings_)
return X_new[:, :self.n_components]
if name == 'main':
samples = 20
dim = 6
lda_dim = 3
data = np.random.random((samples, dim))
label = np.random.random_integers(0, 2, size=(samples, 1))
lda = LinearDiscriminantAnalysis(lda_dim)
lda.fit(data, label)
lda_data = lda.transform(data)
print (lda_data)
from future import print_function import numpy as np from scipy import linalg from sklearn.utils.multiclass import unique_labels from sklearn.utils import check_array, check_X_y from sklearn.utils.validation import check_is_fitted import LDA import sys import kaldi_io
all = ['LocalPairwiseTrainedLinearDiscriminantAnalysis']
X : array-like, shape (n_samples, n_features)
Input data.
Returns
-------
s : array, shape (n_features, n_features)
Estimated covariance matrix.
"""
s = np.cov(X, rowvar=0, bias = 1)
return s
def _similarity_function(mean_vec, vecs):
mean_vec_norm = mean_vec / np.sqrt(np.sum(mean_vec ** 2))
vecs_norm = vecs / np.sqrt(np.sum(vecs ** 2, axis=1))[:, np.newaxis]
cosine_kernel = np.array([np.dot(mean_vec_norm, vecs_norm[i]) for i in range(len(vecs_norm))])
return cosine_kernel
X : array-like, shape (n_samples, n_features)
Input data.
y : array-like, shape (n_samples,) or (n_samples, n_targets)
Target values.
k1: within_between_ratio
k2: nearest_neighbor_ratio
Returns
-------
means : array-like, shape (n_features,)
Class means and neighbor means
"""
means = []
neighbor_means = []
classes = np.unique(y)
samples = np.size(y)
for group in classes:
Xg = X[y == group, :]
Xg_count = Xg.shape[0]
Xg_mean = Xg.mean(0)
Xn = X[y != group, :]
Xg_similarity = _similarity_function(Xg_mean, Xg)
Xg_similarity_min = min(Xg_similarity)
Xn_similarity = _similarity_function(Xg_mean, Xn)
Xn_neighbor_count = len(Xn_similarity[Xn_similarity > Xg_similarity_min])
Xn_neighbor_count = int(max(k1 * Xg_count, k2 * Xn_neighbor_count))
Xn_neighbor_count = min(Xn_neighbor_count, samples - Xg_count)
Xn_label = np.argsort(Xn_similarity)
Xn_label = Xn_label[::-1]
Xg_neighbor = np.array([Xn[Xn_label[i]] for i in range(Xn_neighbor_count)])
Xg_neighbor_mean = Xg_neighbor.mean(0)
means.append(Xg_mean)
neighbor_means.append(Xg_neighbor_mean)
return np.array(means), np.array(neighbor_means)
X : array-like, shape (n_samples, n_features)
Input data.
y : array-like, shape (n_samples,) or (n_samples, n_targets)
Target values.
shrinkage : string or float, optional
Shrinkage parameter, possible values:
- None: no shrinkage (default).
- 'auto': automatic shrinkage using the Ledoit-Wolf lemma.
- float between 0 and 1: fixed shrinkage parameter.
Returns
-------
cov : array-like, shape (n_features, n_features)
Class covariance matrix.
"""
classes = np.unique(y)
covs = []
for group in classes:
Xg = X[y == group, :]
covs.append(np.atleast_2d(_cov(Xg)))
return np.average(covs, axis=0)
class_mean : array-like, shape (n_samples, n_features)
each class mean
neighbor_mean: array-like, shape (n_samples, n_features)
each class neighbor mean
Returns
-------
s : array, shape (n_features, n_features)
Estimated covariance matrix.
"""
covs = []
for i in range(0, len(class_mean)):
local_pair = np.vstack((class_mean[i], neighbor_mean[i]))
covs.append(np.atleast_2d(_cov(local_pair)))
return np.average(covs, axis=0)
class LocalPairwiseLinearDiscriminantAnalysis:
def __init__(self, n_components=None, within_between_ratio=10.0,
nearest_neighbor_ratio=1.2):
self.n_components = n_components
self.within_between_ratio = within_between_ratio
self.nearest_neighbor_ratio = nearest_neighbor_ratio
def _solve_eigen(self, X, y):
"""Eigenvalue solver.
The eigenvalue solver computes the optimal solution of the Rayleigh
coefficient (basically the ratio of between class scatter to within
class scatter). This solver supports both classification and
dimensionality reduction (with optional shrinkage).
Parameters
----------
X : array-like, shape (n_samples, n_features)
Training data.
y : array-like, shape (n_samples,) or (n_samples, n_targets)
Target values.
Notes
-----
This solver is based on [1]_, section 3.8.3, pp. 121-124.
References
----------
.. [1] R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification
(Second Edition). John Wiley & Sons, Inc., New York, 2001. ISBN
0-471-05669-3.
"""
self.means_, self.neighbor_means_ = _class_means_and_neighbor_means(
X, y, self.within_between_ratio, self.nearest_neighbor_ratio)
Sw = _class_cov(X, y) # within class cov
Sb = _local_pairwise_cov(self.means_, self.neighbor_means_)
evals, evecs = linalg.eigh(Sb, Sw)
evecs = evecs[:, np.argsort(evals)[::-1]] # sort eigenvectors
self.scalings_ = np.asarray(evecs)
def fit(self, X, y):
"""Fit Local Pairwise Trained Linear Discriminant Analysis
model according to the given training data and parameters.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Training data.
y : array, shape (n_samples,)
Target values.
"""
X, y = check_X_y(np.asarray(X), np.asarray(y.reshape(-1)), ensure_min_samples=2)
self.classes_ = unique_labels(y)
# Get the maximum number of components
if self.n_components is None:
self.n_components = len(self.classes_) - 1
else:
self.n_components = min(len(self.classes_) - 1, self.n_components)
self._solve_eigen(X, y)
return self
def transform(self, X):
"""Project data to maximize class separation.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Input data.
Returns
-------
X_new : array, shape (n_samples, n_components)
Transformed data.
"""
check_is_fitted(self, ['scalings_'], all_or_any=any)
X = check_array(X)
X_new = np.dot(X, self.scalings_)
return X_new[:, :self.n_components]
def read_kaldi_scp_flt(kaldi_scp): fvec = { k:v for k,v in kaldi_io.read_vec_flt_scp(kaldi_scp) } # binary return fvec
def load_spk2utt(filename): spk2utt = {} with open(filename, "r") as fp: for line in fp.readlines(): line_split = line.strip().split(" ") spkid = line_split[0] if spkid in spk2utt.keys(): print ("load spk2utt failed, spkid is not uniq, %s\n", spkid) exit(-1) spk2utt[spkid] = [] for i in range(1, len(line_split)): uttid = line_split[i] spk2utt[spkid].append(uttid) return spk2utt
def get_lambda_ids_and_vecs(lambda_xvec, min_utts = 6): ids = [] vecs = [] for spkid in lambda_xvec.keys(): if len(lambda_xvec[spkid]) >= min_utts: for vec in lambda_xvec[spkid]: ids.append(spkid) vecs.append(vec) return ids, vecs
def label_str_to_int(label_str): label_dict = {} label_int = [] for item in label_str: if item not in label_dict.keys(): label_dict[item] = len(label_dict) + 1 label_int.append(label_dict[item]) return np.array(label_int)
def train_lda(ids, vecs, lda_dim):
## compute and sub mean
m = np.mean(vecs, axis=0)
vecs = vecs - m
## lplda
lda = LDA.LinearDiscriminantAnalysis(n_components=lda_dim)
lda.fit(np.asarray(vecs), np.asarray(ids))
## compute mean
dim = len(m)
m_trans = lda.transform(np.reshape(m, (1, dim)))
## compute lda trans
vecs_trans = lda.transform(vecs)
## transform matrix
lda_trans = lda.scalings_.T[:lda_dim, :]
return ids, vecs_trans, m_trans, lda_trans
def train_lplda(ids, vecs, lplda_dim):
## compute and sub mean
m = np.mean(vecs, axis=0)
vecs = vecs - m
## lplda
lda = LocalPairwiseLinearDiscriminantAnalysis(n_components=lplda_dim)
lda.fit(np.asarray(vecs), np.asarray(ids))
## compute mean
dim = len(m)
m_trans = lda.transform(np.reshape(m, (1, dim)))
## compute lda trans
vecs_trans = lda.transform(vecs)
## transform matrix
lda_trans = lda.scalings_.T[:lplda_dim, :]
return ids, vecs_trans, m_trans, lda_trans
def lda_lplda_kaldi_wrapper(lda_dim, lplda_dim, kaldi_scp, kaldi_utt2spk, lda_transform):
data = read_kaldi_scp_flt(kaldi_scp)
spk2utt = load_spk2utt(kaldi_utt2spk)
train_vecs = {}
for spkid in spk2utt.keys():
train_vecs[spkid] = []
uttid_uniq = []
for uttid in spk2utt[spkid]:
uttid_uniq.append(uttid)
uttid_uniq = sorted(set(uttid_uniq))
for uttid in uttid_uniq:
if uttid in data.keys():
train_vecs[spkid].append(data[uttid])
## get ids, vecs
ids, vecs = get_lambda_ids_and_vecs(train_vecs)
int_ids = label_str_to_int(ids)
dim = len(vecs[0])
print ("lda lplda, ", len(vecs), len(vecs[0]))
## train lda,lplda
int_ids, lda_trans_vecs, lda_trans_m, lda_trans_mat = train_lda(int_ids, vecs, lda_dim)
int_ids, lplda_trans_vecs, lplda_trans_m, lplda_trans_mat = train_lplda(int_ids, lda_trans_vecs, lplda_dim)
del lplda_trans_vecs, lplda_trans_m
# copy to kaldi format
transform = np.zeros([lplda_dim, dim + 1], float)
lda_lplda_trans = np.dot(lplda_trans_mat, lda_trans_mat)
lda_lplda_m = np.dot(lplda_trans_mat, np.reshape(lda_trans_m, (lda_dim, 1)))
# m_trans = np.dot(lda_trans, m)
for r in range(lplda_dim):
for c in range(dim):
transform[r][c] = lda_lplda_trans[r][c]
transform[r][dim] = -1.0 * lda_lplda_m[r]
## save lda transform
kaldi_io.write_mat(lda_transform, transform)
return
def lplda_kaldi_wrapper(lda_dim, kaldi_scp, kaldi_utt2spk, lda_transform):
data = read_kaldi_scp_flt(kaldi_scp)
spk2utt = load_spk2utt(kaldi_utt2spk)
train_vecs = {}
for spkid in spk2utt.keys():
train_vecs[spkid] = []
uttid_uniq = []
for uttid in spk2utt[spkid]:
uttid_uniq.append(uttid)
uttid_uniq = sorted(set(uttid_uniq))
for uttid in uttid_uniq:
if uttid in data.keys():
train_vecs[spkid].append(data[uttid])
## get ids, vecs
ids, vecs = get_lambda_ids_and_vecs(train_vecs)
int_ids = label_str_to_int(ids)
print ("lplda, ", len(vecs), len(vecs[0]))
## compute and sub mean
m = np.mean(vecs, axis=0)
vecs = vecs - m
## lplda
lda = LocalPairwiseLinearDiscriminantAnalysis(n_components=lda_dim)
lda.fit(np.asarray(vecs), np.asarray(int_ids))
## compute mean
dim = len(m)
transform_m = lda.transform(np.reshape(m, (1, dim)))
# copy to kaldi format
transform = np.zeros([lda_dim, dim + 1], float)
lda_trans = lda.scalings_.T[:lda_dim, :]
# m_trans = np.dot(lda_trans, m)
for r in range(lda_dim):
for c in range(dim):
transform[r][c] = lda_trans[r][c]
transform[r][dim] = -1.0 * transform_m[0][r]
## save lda transform
kaldi_io.write_mat(lda_transform, transform)
return
if name == 'main':
if len(sys.argv) != 6:
print ("%s lda_dim lplda_dim kaldi_scp kaldi_utt2spk kaldi_lda_transform\n" % sys.argv[0])
sys.exit
lda_dim = int(sys.argv[1])
lplda_dim = int(sys.argv[2])
kaldi_scp = sys.argv[3]
kaldi_utt2spk = sys.argv[4]
lda_transform = sys.argv[5]
# lda_dim = 150
# lplda_dim = 100
# kaldi_scp = "./xvector_sre16_sre18_combined.scp"
# # kaldi_scp = "./xvectors_sre16_sre18_combined.scp"
# kaldi_utt2spk = "spk2utt"
# lda_transform = "python_kaldi_lplda_transform.mat"
# lplda_kaldi_wrapper(lda_dim, kaldi_scp, kaldi_utt2spk, lda_transform)
lda_lplda_kaldi_wrapper(lda_dim, lplda_dim, kaldi_scp, kaldi_utt2spk, lda_transform)
# ivector-compute-lda --total-covariance-factor=0.0 --dim=$lda_dim \
# "ark:ivector-subtract-global-mean scp:$nnet_dir/xvectors_$name/xvector.scp ark:- |" \
# ark:$data/$name/utt2spk $nnet_dir/xvectors_$name/transform.mat
# samples = 20
# dim = 6
# lda_dim = 3
# data = np.random.random((samples, dim))
# label = np.random.random_integers(0, 2, size=(samples, 1))
# lda = LocalPairwiseLinearDiscriminantAnalysis(lda_dim)
# lda.fit(data, label)
# lda_data = lda.transform(data)
# print (lda_data)
from future import print_function import numpy as np from scipy import linalg from sklearn.utils.multiclass import unique_labels from sklearn.utils import check_array, check_X_y from sklearn.utils.validation import check_is_fitted import sys import kaldi_io
all = ['LocalPairwiseTrainedLinearDiscriminantAnalysis']
X : array-like, shape (n_samples, n_features)
Input data.
Returns
-------
s : array, shape (n_features, n_features)
Estimated covariance matrix.
"""
s = np.cov(X, rowvar=0, bias = 1)
return s
def _similarity_function(mean_vec, vecs):
mean_vec_norm = mean_vec / np.sqrt(np.sum(mean_vec ** 2))
vecs_norm = vecs / np.sqrt(np.sum(vecs ** 2, axis=1))[:, np.newaxis]
cosine_kernel = np.array([np.dot(mean_vec_norm, vecs_norm[i]) for i in range(len(vecs_norm))])
return cosine_kernel
X : array-like, shape (n_samples, n_features)
Input data.
y : array-like, shape (n_samples,) or (n_samples, n_targets)
Target values.
k1: within_between_ratio
k2: nearest_neighbor_ratio
Returns
-------
means : array-like, shape (n_features,)
Class means and neighbor means
"""
means = []
neighbor_means = []
classes = np.unique(y)
samples = np.size(y)
for group in classes:
Xg = X[y == group, :]
Xg_count = Xg.shape[0]
Xg_mean = Xg.mean(0)
Xn = X[y != group, :]
Xg_similarity = _similarity_function(Xg_mean, Xg)
Xg_similarity_min = min(Xg_similarity)
Xn_similarity = _similarity_function(Xg_mean, Xn)
Xn_neighbor_count = len(Xn_similarity[Xn_similarity > Xg_similarity_min])
Xn_neighbor_count = int(max(k1 * Xg_count, k2 * Xn_neighbor_count))
Xn_neighbor_count = min(Xn_neighbor_count, samples - Xg_count)
Xn_label = np.argsort(Xn_similarity)
Xn_label = Xn_label[::-1]
Xg_neighbor = np.array([Xn[Xn_label[i]] for i in range(Xn_neighbor_count)])
Xg_neighbor_mean = Xg_neighbor.mean(0)
means.append(Xg_mean)
neighbor_means.append(Xg_neighbor_mean)
return np.array(means), np.array(neighbor_means)
X : array-like, shape (n_samples, n_features)
Input data.
y : array-like, shape (n_samples,) or (n_samples, n_targets)
Target values.
shrinkage : string or float, optional
Shrinkage parameter, possible values:
- None: no shrinkage (default).
- 'auto': automatic shrinkage using the Ledoit-Wolf lemma.
- float between 0 and 1: fixed shrinkage parameter.
Returns
-------
cov : array-like, shape (n_features, n_features)
Class covariance matrix.
"""
classes = np.unique(y)
covs = []
for group in classes:
Xg = X[y == group, :]
covs.append(np.atleast_2d(_cov(Xg)))
return np.average(covs, axis=0)
class_mean : array-like, shape (n_samples, n_features)
each class mean
neighbor_mean: array-like, shape (n_samples, n_features)
each class neighbor mean
Returns
-------
s : array, shape (n_features, n_features)
Estimated covariance matrix.
"""
covs = []
for i in range(0, len(class_mean)):
local_pair = np.vstack((class_mean[i], neighbor_mean[i]))
covs.append(np.atleast_2d(_cov(local_pair)))
return np.average(covs, axis=0)
class LocalPairwiseLinearDiscriminantAnalysis:
def __init__(self, n_components=None, within_between_ratio=10.0,
nearest_neighbor_ratio=1.2):
self.n_components = n_components
self.within_between_ratio = within_between_ratio
self.nearest_neighbor_ratio = nearest_neighbor_ratio
def _solve_eigen(self, X, y):
"""Eigenvalue solver.
The eigenvalue solver computes the optimal solution of the Rayleigh
coefficient (basically the ratio of between class scatter to within
class scatter). This solver supports both classification and
dimensionality reduction (with optional shrinkage).
Parameters
----------
X : array-like, shape (n_samples, n_features)
Training data.
y : array-like, shape (n_samples,) or (n_samples, n_targets)
Target values.
Notes
-----
This solver is based on [1]_, section 3.8.3, pp. 121-124.
References
----------
.. [1] R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification
(Second Edition). John Wiley & Sons, Inc., New York, 2001. ISBN
0-471-05669-3.
"""
self.means_, self.neighbor_means_ = _class_means_and_neighbor_means(
X, y, self.within_between_ratio, self.nearest_neighbor_ratio)
Sw = _class_cov(X, y) # within class cov
Sb = _local_pairwise_cov(self.means_, self.neighbor_means_)
evals, evecs = linalg.eigh(Sb, Sw)
evecs = evecs[:, np.argsort(evals)[::-1]] # sort eigenvectors
self.scalings_ = np.asarray(evecs)
def fit(self, X, y):
"""Fit Local Pairwise Trained Linear Discriminant Analysis
model according to the given training data and parameters.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Training data.
y : array, shape (n_samples,)
Target values.
"""
X, y = check_X_y(np.asarray(X), np.asarray(y.reshape(-1)), ensure_min_samples=2)
self.classes_ = unique_labels(y)
# Get the maximum number of components
if self.n_components is None:
self.n_components = len(self.classes_) - 1
else:
self.n_components = min(len(self.classes_) - 1, self.n_components)
self._solve_eigen(X, y)
return self
def transform(self, X):
"""Project data to maximize class separation.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Input data.
Returns
-------
X_new : array, shape (n_samples, n_components)
Transformed data.
"""
check_is_fitted(self, ['scalings_'], all_or_any=any)
X = check_array(X)
X_new = np.dot(X, self.scalings_)
return X_new[:, :self.n_components]
def read_kaldi_scp_flt(kaldi_scp): fvec = { k:v for k,v in kaldi_io.read_vec_flt_scp(kaldi_scp) } # binary return fvec
def load_spk2utt(filename): spk2utt = {} with open(filename, "r") as fp: for line in fp.readlines(): line_split = line.strip().split(" ") spkid = line_split[0] if spkid in spk2utt.keys(): print ("load spk2utt failed, spkid is not uniq, %s\n", spkid) exit(-1) spk2utt[spkid] = [] for i in range(1, len(line_split)): uttid = line_split[i] spk2utt[spkid].append(uttid) return spk2utt
def get_lambda_ids_and_vecs(lambda_xvec, min_utts = 6): ids = [] vecs = [] for spkid in lambda_xvec.keys(): if len(lambda_xvec[spkid]) >= min_utts: for vec in lambda_xvec[spkid]: ids.append(spkid) vecs.append(vec) return ids, vecs
def label_str_to_int(label_str): label_dict = {} label_int = [] for item in label_str: if item not in label_dict.keys(): label_dict[item] = len(label_dict) + 1 label_int.append(label_dict[item]) return np.array(label_int)
def lplda_kaldi_wrapper(lda_dim, kaldi_scp, kaldi_utt2spk, lda_transform):
data = read_kaldi_scp_flt(kaldi_scp)
spk2utt = load_spk2utt(kaldi_utt2spk)
# train_vecs = {}
# for spkid in spk2utt.keys():
# train_vecs[spkid] = []
# for uttid in spk2utt[spkid]:
# map_uttid = spkid[6:] + "_" + uttid + "_A"
# if map_uttid in data.keys():
# train_vecs[spkid].append(data[map_uttid])
train_vecs = {}
for spkid in spk2utt.keys():
train_vecs[spkid] = []
uttid_uniq = []
for uttid in spk2utt[spkid]:
uttid_uniq.append(uttid)
uttid_uniq = sorted(set(uttid_uniq))
for uttid in uttid_uniq:
if uttid in data.keys():
train_vecs[spkid].append(data[uttid])
## get ids, vecs
ids, vecs = get_lambda_ids_and_vecs(train_vecs)
int_ids = label_str_to_int(ids)
print ("lplda, ", len(vecs), len(vecs[0]))
## compute and sub mean
m = np.mean(vecs, axis=0)
vecs = vecs - m
## lplda
lda = LocalPairwiseLinearDiscriminantAnalysis(n_components=lda_dim)
lda.fit(np.asarray(vecs), np.asarray(int_ids))
## compute mean
dim = len(m)
transform_m = lda.transform(np.reshape(m, (1, dim)))
# copy to kaldi format
transform = np.zeros([lda_dim, dim + 1], float)
lda_trans = lda.scalings_.T[:lda_dim, :]
# m_trans = np.dot(lda_trans, m)
for r in range(lda_dim):
for c in range(dim):
transform[r][c] = lda_trans[r][c]
transform[r][dim] = -1.0 * transform_m[0][r]
## save lda transform
kaldi_io.write_mat(lda_transform, transform)
return
if name == 'main':
if len(sys.argv) != 5:
print ("%s lda_dim kaldi_scp kaldi_utt2spk kaldi_lda_transform\n" % sys.argv[0])
sys.exit
lda_dim = sys.argv[1]
kaldi_scp = sys.argv[2]
kaldi_utt2spk = sys.argv[3]
lda_transform = sys.argv[4]
# lda_dim = 100
# kaldi_scp = "./xvector_sre16_sre18_combined.scp"
# kaldi_utt2spk = "spk2utt"
# lda_transform = "python_kaldi_lplda_transform.mat"
lplda_kaldi_wrapper(lda_dim, kaldi_scp, kaldi_utt2spk, lda_transform)
# ivector-compute-lda --total-covariance-factor=0.0 --dim=$lda_dim \
# "ark:ivector-subtract-global-mean scp:$nnet_dir/xvectors_$name/xvector.scp ark:- |" \
# ark:$data/$name/utt2spk $nnet_dir/xvectors_$name/transform.mat
# samples = 20
# dim = 6
# lda_dim = 3
# data = np.random.random((samples, dim))
# label = np.random.random_integers(0, 2, size=(samples, 1))
# lda = LocalPairwiseLinearDiscriminantAnalysis(lda_dim)
# lda.fit(data, label)
# lda_data = lda.transform(data)
# print (lda_data)
@sanphiee , I have trained lplda+PLDA and kaldi lda+PLDA with 150K utterances. I have seen no improvement in EER with lplda+PLDA compared with kaldi lda+PLDA.
Is there any method to improve speaker verification?
Some suggestions:
1) You can print the eigenvalues of LDA and LPLDA to select a proper dimension.
The proper dimensions of LDA and LPLDA may be different.
2) It also depends on the data, see the attachment Fig (From ID R&D NIST SRE19 system description).
You can see that the distributions of X-vectors on NIST SRE04-08, NIST SRE18 or SRE19 are different.
The LPLDA needs the selection of neighbor points, which is easily influenced by the total distribution.
3) The whole system configuration is another consideration.
For example, if you take as-norm as your score post-process method, the performance of LPLDA may be degrade.
Because the as-norm also needs selection of neighbor scores (Top N scores).
Yours sincerely,
He Liang,
Rohm Building 8101,
Department of Electronic Engineering, Tsinghua University,
Beijing, 10084, China
发件人: noreply@github.com noreply@github.com 代表 Kunasi Ramesh 发送时间: Tuesday, December 31, 2019 8:21 PM 收件人: sanphiee/LPLDA LPLDA@noreply.github.com 抄送: He Liang heliang@mail.tsinghua.edu.cn; Mention mention@noreply.github.com 主题: Re: [sanphiee/LPLDA] Process is getting killed (#1)
@sanphiee https://github.com/sanphiee , I have trained lplda+PLDA and kaldi lda+PLDA with 150K utterances. I have seen no improvement in EER with lplda+PLDA compared with kaldi lda+PLDA.
Is there any method to improve speaker verification?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sanphiee/LPLDA/issues/1?email_source=notifications&email_token=AFU2POIKCKIOC75RTTUT2L3Q3M2JJA5CNFSM4JRIT2GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH4EYJI#issuecomment-569920549 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AFU2POL4NERUUXOVCKJASHLQ3M2JJANCNFSM4JRIT2GA . https://github.com/notifications/beacon/AFU2POMHKV6PB72GGR4DXLTQ3M2JJA5CNFSM4JRIT2GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH4EYJI.gif
@sanphiee
I have tried 4 different combinations of LDA and LPLDA dimensions. In all 4 cases there is no improvement w.r.t kaldi LDA+PLDA. For training LPLDA with 150K utterance it is taking huge time around 4 to 5 hrs. Because of this limitation I didn't tried more combinations.
If I want to train LPLDA with more than 150K utterances, the process is getting killed. My CPU RAM is 16GB. Is there any way to train LPLDA with more utterances?
iPhone
在 2020年1月2日,15:25,Kunasi Ramesh notifications@github.com 写道:
@sanphiee
I have tried 4 different combinations of LDA and LPLDA dimensions. In all 4 cases there is no improvement w.r.t kaldi LDA+PLDA. For training LPLDA with 150K utterance it is taking huge time around 4 to 5 hrs. Because of this limitation I didn't tried more combinations.
If I want to train LPLDA with more than 150K utterances, the process is getting killed. My CPU RAM is 16GB. Is there any way to train LPLDA with more utterances?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
If I want use the x-vector model and PLDA model in real time scenarios, how to select threshold in this case. Because same threshold(SITW evaluation) if I use for real-time scenarios the performance is not up to the mark
Hi,
I am using a 200K utterance to train LDA. While training LDA CPU RAM getting full and the process was killed. My CPU RAM is 8GB & 2GB swap memory. How to train LDA with a large amount of data?