shensq04 / EKLAVYA

56 stars 16 forks source link

ValueError: invalid literal for int() with base 10: '' #3

Closed qingbol closed 4 years ago

qingbol commented 4 years ago

When i ran save_embeddings.py.

        for word in input_data['word2id']:
            word_id = input_data['word2id'][word]
            ids.append(word_id)

            if len(ids) == 1000:
                part_vector = tf.nn.embedding_lookup(w_out, ids).eval()

                for i in range(len(ids)):
                    word_id = ids[i]
                    word = input_data['id2word'][word_id]
                    if word != 'UNK':
                        # print (word)
                        word = int(word)

Following error occurs:

Traceback (most recent call last): File "test.py", line 141, in main() File "test.py", line 110, in main word = int(word) ValueError: invalid literal for int() with base 10: ''

In word = int(word) , word= input_data['id2word'][word_id] which is an element in input_data['id2word'] <type 'numpy.ndarray'>.

I printed some related value: The result of print type(input_data['id2word']) is

<type 'numpy.ndarray'>

The result of print (input_data['id2word'][6768:6778]) is

['u\x8b' '\x00\x02' '\xaaR' 'M\xc3' '_\xbf' '\xdb\xb2' '\xfa\xaf'

The result of print (input_data['id2word']) is

['UNK' 'U' '\x94' ... '.IJ\xfc\x10\xa2\xda' '\xec\x87\x86\x8c%\xef&C\xe51xS\xf9\xe7]\xfb\xa7\xd8?\x11e\x0e}\x1a\xd0\xae9\xe1\xc5A\xe9)' ... '\x9c]3\x80\x10\xd6\x18@\xd8\xfa\xdb\x00\xa0\xaa\xb2\xc50\xbcl\xdd\x10^\xadM\xc7\xf0\xb2\x15Ax\xb9E\xad\xc2\xc0Y6\x84P\x8c\x98p\xf6\xba\x01\x84\xb0+\x06\x10\xb6\x1a\x03\x80*?\x90\x13\x03Uuf0P\xd5\x98\xea\xb5\x880w\xf8Da\x880\xb7\x81\xa80D\x98[lV\xbc\xb5_8{\xc7\x00\xc2\xd95\x03\x08a\xbd\x01\xc0V\x99\x19']

So, when we execute this statement word = int(word) it's some thing like word = int('\x00\x02') or word = int('\xec\x87\x86\x8c%\xef&C\xe51xS\xf9\xe7]\xfb\xa7\xd8?\x11e\x0e}\x1a\xd0\xae9\xe1\xc5A\xe9)')

which make this error occur: " ValueError: invalid literal for int() with base 10: ''

how to fix this? what's the purpose of statement word = int(word) ? Looking forward to your answer @melynx, thank you.

qingbol commented 4 years ago

I changed the snippet in save_embeddings.py

             for i in range(len(ids)):
                    word_id = ids[i]
                    word = input_data['id2word'][word_id]
                    if word != 'UNK':
                        #word = int(word)

to

             for i in range(len(ids)):
                    word_id = ids[i]
                    word = input_data['id2word'][word_id]
                    print "The value of i :{}".format(i) 
                    print "The value of word_id=ids[i] :{}".format(word_id) 
                    print "The value of word = input_data['id2word'][word_id]:{}".format(word) 
                    if word != 'UNK':
                        word_bytearray=bytearray(word)
                        word_lst=list(word_bytearray)
                        word = insn_int.insn2int_inverse(word_lst)

Now the input_data['id2word'][word_id] element can converted to int , but another error occurs: Embed input data loaded Loading int to instruction map... ids[] type: <type 'list'> ids[]: [6769, 36, 50589, 64451, 22725, 60313, 264, 27191, 8528948, 16757054, 37647, 15141316, 60872, 70776, 169799, 59119, 19368, 64195, 48039, 42442] The value of i :0 The value of word_id=ids[i] :6769 The int value of word = input_data['id2word'][word_id]: 512

Int to instruction map loaded Traceback (most recent call last): File "test.py", line 151, in main() File "test.py", line 123, in main insn = int2insn_map[word] KeyError: 512

That means the way I convert word to int(word) is not right. I don't know how to fix this, pls help.

qingbol commented 4 years ago

solved