Issue with training model on C# dataset

Hi, First of all great paper. I am using code2vec on C# dataset, I was able to pre-process the data using preprocess_chsharp.sh. When trying to run train.sh , I am getting following an indexing error exceeding 200, although it does not look so. Not sure why.

Appreciate your time.

Here is a complete run log: $ ./train.sh 2020-02-16 02:33:26.525726: 2020-02-16 02:33:26.548006: 2020-02-16 02:33:26.549896: 2020-02-16 02:33:26.549988: 2020-02-16 02:33:26,551 INFO
2020-02-16 02:33:26,552 INFO
2020-02-16 02:33:26,552 INFO 2020-02-16 02:33:26,552 INFO 2020-02-16 02:33:26,552 INFO 2020-02-16 02:33:26,552 INFO 2020-02-16 02:33:26,552 INFO 2020-02-16 02:33:26,552 INFO 2020-02-16 02:33:26,552 INFO 2020-02-16 02:33:26,553 INFO 2020-02-16 02:33:26,553 INFO 2020-02-16 02:33:26,553 INFO 2020-02-16 02:33:26,553 INFO 2020-02-16 02:33:26,553 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,554 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,555 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,556 INFO 2020-02-16 02:33:26,557 INFO 2020-02-16 02:33:26,557 INFO 2020-02-16 02:33:26,559 INFO 2020-02-16 02:33:26,559 INFO 2020-02-16 02:33:26,560 INFO 2020-02-16 02:33:26,563 INFO 2020-02-16 02:33:26,564 INFO 2020-02-16 02:33:26,569 INFO 2020-02-16 02:33:26,570 INFO (<tf.Tensor 'IteratorGetNext:0' WARNING: Logging before W0216 02:33:27.531919 Instructions for updating: If using Keras pass 2020-02-16 02:33:27,780 INFO 2020-02-16 02:33:27,781 INFO 2020-02-16 02:33:27,781 INFO 2020-02-16 02:33:27,781 INFO 2020-02-16 02:33:27,781 INFO 2020-02-16 02:33:27,781 INFO 2020-02-16 02:33:27,886 INFO 2020-02-16 02:33:29,033 INFO 2020-02-16 02:33:30.101047: Traceback (most recent call last): File "/home/anki/anacon return fn(args) File "/home/anki/anacon target_list, run_metadata) File "/home/anki/anacon run_metadata) tensorflow.python.framework. [[{{node SparseToDense}}]] [[IteratorGetNext]] I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1992000000 Hz I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d8993d7790 executing computations on platform Host. Devices: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version --------------------------------------------------------------------- --------------------------------------------------------------------- ---------------------- Creating word2vec model ---------------------- --------------------------------------------------------------------- --------------------------------------------------------------------- Checking number of examples ... Number of train examples: 3878 Number of test examples: 1793 --------------------------------------------------------------------- ----------------- Configuration - Hyper Parameters ------------------ CODE_VECTOR_SIZE 384 CSV_BUFFER_SIZE 104857600 DEFAULT_EMBEDDINGS_SIZE 128 DL_FRAMEWORK tensorflow DROPOUT_KEEP_RATE 0.75 EXPORT_CODE_VECTORS False LOGS_PATH None MAX_CONTEXTS 200 MAX_PATH_VOCAB_SIZE 911417 MAX_TARGET_VOCAB_SIZE 261245 MAX_TOKEN_VOCAB_SIZE 1301136 MAX_TO_KEEP 10 MODEL_LOAD_PATH None MODEL_SAVE_PATH models/sharp/saved_model NUM_BATCHES_TO_LOG_PROGRESS 100 NUM_TEST_EXAMPLES 1793 NUM_TRAIN_BATCHES_TO_EVALUATE 1800 NUM_TRAIN_EPOCHS 20 NUM_TRAIN_EXAMPLES 3878 PATH_EMBEDDINGS_SIZE 128 PREDICT False READER_NUM_PARALLEL_BATCHES 6 RELEASE False SAVE_EVERY_EPOCHS 1 SAVE_T2V None SAVE_W2V None SEPARATE_OOV_AND_PAD False SHUFFLE_BUFFER_SIZE 10000 TARGET_EMBEDDINGS_SIZE 384 TEST_BATCH_SIZE 1024 TEST_DATA_PATH data/csharp/csharp.val.c2v TOKEN_EMBEDDINGS_SIZE 128 TOP_K_WORDS_CONSIDERED_DURING_PREDICTION 5 TRAIN_BATCH_SIZE 1024 TRAIN_DATA_PATH_PREFIX data/csharp/csharp USE_TENSORBOARD False VERBOSE_MODE 1 _Configlogger <Logger code2vec (INFO)> context_vector_size 384 entire_model_load_path None entire_model_save_path models/sharp/saved_model__entire-model is_loading False is_saving True is_testing True is_training True model_weights_load_path None model_weights_save_path models/sharp/saved_modelonly-weights test_steps 2 train_data_path data/csharp/csharp.train.c2v train_steps_per_epoch 4 word_freq_dict_path data/csharp/csharp.dict.c2v --------------------------------------------------------------------- Loading word frequencies dictionaries from: data/csharp/csharp.dict.c2v ... Done loading word frequencies dictionaries. Word frequencies dictionaries loaded. Now creating vocabularies. Created token vocab. size: 313 Created path vocab. size: 4549 Created target vocab. size: 4 Done creating code2vec model Starting training shape=(None,) dtype=int32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 200) dtype=int32>, <tf.Tensor 'IteratorGetNext:2' shape=(None, 200) dtype=int32>, <tf.Tensor 'IteratorGetNext:3' shape=(None, 200) dtype=int32>, <tf.Tensor 'IteratorGetNext:4' shape=(None, 200) dtype=float32>) flag parsing goes to stderr. 140189473613632 deprecation.py:506] From /home/anki/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. _constraint arguments to layers. Number of trainable params: 771712 variable name: model/WORDS_VOCAB:0 -- shape: (313, 128) -- #params: 40064 variable name: model/TARGET_WORDS_VOCAB:0 -- shape: (4, 384) -- #params: 1536 variable name: model/ATTENTION:0 -- shape: (384, 1) -- #params: 384 variable name: model/PATHS_VOCAB:0 -- shape: (4549, 128) -- #params: 582272 variable name: model/TRANSFORM:0 -- shape: (384, 384) -- #params: 147456 Initalized variables Started reader... W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[78] = [25,3] is out of bounds: need 0 <= index < [200,3] da3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call da3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn da3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun errors_impl.InvalidArgumentError: {{function_node __inference_Dataset_map_PathContextReader._map_raw_dataset_row_to_expected_model_input_form_481}} indices[78] = [25,3] is out of bounds: need 0 <= index < [200,3]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "code2vec.py", line 23, in model.train() File "/home/anki/anki/github/codevec/code2vec/tensorflowmodel.py", line 80, in train , batch_loss = self.sess.run([optimizer, train_loss]) File "/home/anki/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/anki/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/anki/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/anki/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[78] = [25,3] is out of bounds: need 0 <= index < [200,3] [[{{node SparseToDense}}]] [[IteratorGetNext]] 2020-02-16 02:33:30.143602: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[585] = [194,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.158836: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[189] = [62,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.164240: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[399] = [132,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.167973: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[30] = [9,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.208490: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[69] = [22,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.212536: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[171] = [56,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.228089: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[171] = [56,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.267801: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[30] = [9,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.301149: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[414] = [137,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.377550: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[138] = [45,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.408714: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[219] = [72,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.453522: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[66] = [21,3] is out of bounds: need 0 <= index < [200,3] 2020-02-16 02:33:30.460196: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[414] = [137,3] is out of bounds: need 0 <= index < [200,3]

Thanks Uri. Here is the output from the command:

cat csharp.train.raw.txt | cut -d' ' -f2- | tr ' ' '\n' | awk -F',' 'NF > 3'

tech-srl / code2vec

Issue with training model on C# dataset #65