google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.55k stars 320 forks source link

ValueError: not enough values to unpack (expected 2, got 1) #6

Closed MuruganR96 closed 5 years ago

MuruganR96 commented 5 years ago

@wq2012 i was recorded me and three different speakers audio for 4 sec, mono channel, 16k wav format file. note: if any restrictions is there or not audio duration, format, and size ? i given my audio file array as train_data, as well as test_data,

label_to_center = {
      'A': np.array(a[1],dtype=float),
      'B': np.array(b[1],dtype=float),
      'C': np.array(c[1],dtype=float),
      'D': np.array(d[1],dtype=float),
    }
python3 integration_test.py 
(63488,)
[  0.   0.  -2. ... 700. 687. 679.]
(64884,)
[ 8. 16.  2. ... 43. 44. 50.]
(63488,)
[  0.   0.  -2. ... 350. 392. 424.]
(63488,)
[  0.   2.  -8. ... 364. 343. 421.]
E
======================================================================
ERROR: test_four_clusters (__main__.TestIntegration)
Four clusters on vertices of a square.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "integration_test.py", line 86, in test_four_clusters
    train_cluster_id, label_to_center, sigma=0.01)
  File "integration_test.py", line 42, in _generate_random_sequence
    result = np.vstack((result, label_to_center[id]))
  File "/home/dell/Pictures/dp-0.1.1/12-09-2018/voice_reg/mycroft-precise/.venv/lib/python3.6/site-packages/numpy/core/shape_base.py", line 234, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

----------------------------------------------------------------------
Ran 1 test in 0.038s

FAILED (errors=1)

ValueError: all the input array dimensions except for the concatenation axis must match exactly result = np.concatenate((result, label_to_center[id]))

again shows this error,

ValueError: not enough values to unpack (expected 2, got 1)

ERROR: test_four_clusters (__main__.TestIntegration)
Four clusters on vertices of a square.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "integration_test.py", line 112, in test_four_clusters
    model.fit(train_sequence, np.array(train_cluster_id), training_args)
  File "/home/dell/Pictures/dp-0.1.1/12-09-2018/voice_reg/mycroft-precise/uis-rnn/uis-rnn/model/uisrnn.py", line 180, in fit
    train_total_length, observation_dim = train_sequence.shape
ValueError: not enough values to unpack (expected 2, got 1)

my audio sequence shape: np.shape(np.array(a[1],dtype=float)) --> (63488,) np.array(a[1],dtype=float) --> [ 0. 0. -2. ... 700. 687. 679.]

uis-rnn ./data/training_data.npz sequence shape: np.shape(sequence[sampled_idx_sets[j], :]) --> (39, 256) shape of sequence

if numpy shape is an issue then it will resolve automatically using utills.py but it can't reach that utils.resize_sequence function.

issue for audio sequence ,

train_sequence = _generate_random_sequence(train_cluster_id, label_to_center, sigma=0.01)
print("train_seq...............", train_sequence)
train_seq............... [ 4.17022005e-03  7.20324493e-03 -1.99999886e+00 ...  7.00003598e+02
  6.87002227e+02  6.79005481e+02]

(63906800, ) train_sequence.shape

train_total_length, observation_dim = train_sequence.shape
ValueError: not enough values to unpack (expected 2, got 1)

how to resolve this issue @wq2012 sir. advance thanks

wq2012 commented 5 years ago

You are using the APIs in the wrong way.

I have updated the README.md with more detailed instructions.

For integration test, label_to_center should be a dict from string to 1-d vectors, not to numbers.

Also, you are not supposed to directly apply UIS-RNN to audio. Instead you should apply it on speaker discriminative embeddings like d-vectors.

MuruganR96 commented 5 years ago

@wq2012 thank you. how to pass d-vector embeddings? i referered this paper and github repo,

https://arxiv.org/pdf/1710.10467.pdf https://github.com/HarryVolek/PyTorch_Speaker_Verification

but i was confused,

Here train_sequence should be a 2-dim numpy array of type float, for the concatenated observation sequences.
For speaker diarization, this could be the d-vector embeddings.

For example, if you have M training utterances, and each utterance is a sequence of L embeddings.
Each embedding is a vector of D numbers. 
Then the shape of train_sequence is N * D, where N = M * L.
train_sequence: 2-dim numpy array of real numbers, size: N * D
        - the training observation sequence.
        N - summation of lengths of all utterances
        D - observation dimension

We concatenate all training utterances into a single sequence.

Note that the order of entries within an utterance are preserved,
        and all utterances are simply concatenated together.

in this statement, where i am do this, D-vector training_sequence with UIS-RNN training. because in this PyTorch_Speaker_Verification i was created TIMIT Dataset D-vectors embeedings. but I don't know how to process this D-vector embeddings into our UIS-RNN?.

We concatenate all training utterances into a single sequence. i was confusing in this line. respected sir what you meant ? how can i concatenate all training utterances into a single sequence. i think i am not sure it is correct or not fully. i am a beginner this concept sir. can you help me sir?

sir please help me. thank you for advance response

wq2012 commented 5 years ago

Concatenation means:

If:
  train_sequence_1 = [E1, E2]
  train_sequence_2 = [E3, E4, E5]
  train_cluster_id_1 = ['1', '2']
  train_cluster_id_2 = ['3', '4', '5']
Then:
  train_sequence = [E1, E2, E3, E4, E5]  # concatenated
  train_cluster_id = ['1', '2', '3', '4', '5']  # concatenated

The reason that we concatenate is that we will be resampling training data and block-wise shuffling training data as a data augmentation process.

But yes, I admit this API is a little weird. We will change it in the future, as a long term plan.

About d-vectors embeddings, we are not responsible for any third-party implementations.

MuruganR96 commented 5 years ago

thank you so much sir. About d-vectors embeddings, we are not responsible for any third-party implementations

then how can i generated d-vectors embeddings? sir give me some hint how to construct d-vector embeddings? the above one is useful or not sir?

i think now i am very clear about uis-rnn api and then architecture as well as.

but i can't move another step, because of that d-vector embeddings construct and intialize,

if u interested to help me, then give your suggestions sir.

thank you very much for your response sir.

wq2012 commented 5 years ago

Glad that the UIS-RNN API is clear to you.

You can use any third-party implementation of d-vector embeddings, or similar techniques like x-vectors from JHU. But we are not responsible for the quality of them. You need to directly ask the authors of those libraries on how to use them.

Some of the libraries are only able to produce per-utterance d-vector embeddings, while for UIS-RNN, we require continuous d-vector embeddings (as sequences). We have no guarantee which third-party library supports this. You need to do your own research here.

This GitHub repo is for the UIS-RNN library only.