shtoshni / g2p

Code for SLT 2016 paper on Grapheme-to-Phoneme conversion using attention based encoder-decoder models
15 stars 3 forks source link

seq2seq_model.py's get_batch(...) raises "ValueError: setting an array element with a sequence." #6

Closed makwadajp closed 3 years ago

makwadajp commented 3 years ago

Although CMUDict setup doesn't raise an exception, I tried it with other dataset and I believe there is a bug in seq2seq_model.py get_batch(self, data, bucket_id=None) method. Specifically I believe there is a case when the decoder_pad_size becomes < 0 when self.isTraining is false at the following code:

decoder_pad_size = max_len_target - (len(decoder_input) + 1)

When decoder_pad_size is negative the following error is raised:

decoder_inputs = np.asarray(decoder_inputs, dtype=np.int32).T
File "/home/XXXX/.local/share/virtualenvs/G2P-MASTER-mSKoJG47/lib/python2.7/site-packages/numpy/core/numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

I believe the following is the culprit. It should be with + 1

seq_len_target[i] = decoder_size        # Original Code
seq_len_target[i] = decoder_size + 1    # Fixed Code

For your information, here are the conditions when ValueError occurs:

With above condition, the original code creates decoder_inputs with the shape of 256 [FLAGS.batch_size] x 36, when it should be 256 [FLAGS.batch_size] x 37 ([data_utils.GO_ID] + decoder_input + [data_utils.EOS_ID] + [data_utils.PAD_ID] * 0)

As an extra information, my G2P dataset is not English and has a Maximum input sequence length of 65 and Maximum output sequence length of 97. While the above fix of + 1 seem to do the trick (no more ValueError), should I be concerned with other parameters (e.g. _buckets = [(35, 35)] in data_utils.py? I read your comment regarding the bucket but the link you mention is broken: http://goo.gl/d8ybpl).

Buckets are useful to limit the amount of padding required since we use minibatch processing. For more detail refer this: http://goo.gl/d8ybpl Bucket sizes are specified as a list with each entry of the form - (Max input sequence length, Max output sequence length) Since this project is about Grapheme-to-Phoneme conversion, where the input sequence is characters in a word and output sequence is phonemes in word pronunciation, we use a single bucket to merely denote the max word length and max pronunciation length

shtoshni commented 3 years ago

Hey, thanks for raising this issue. This project is close to 5 years old, and I have moved on in my research direction. If I were to redo this project, I would not use buckets at all, since G2P sequences are fairly short. I would suggest adapting this code and removing the dependence on buckets altogether.

makwadajp commented 3 years ago

Thank you for replying. It seems that you've updated the code with attention roughly 2 years ago but I see and I understand. Thank you again.