Closed Tamal-Mondal closed 2 years ago
Closing the issue as similar issues are addressed previously it seems, I will take a look and try to resolve.
Hi Team,
I found discussion about this error in some of the previous issues. You mentioned in some cases the issue is with MAX_PATH_LENGTH(#4 , #28 ) and in one case you mentioned the there is extra comma in extractor output(https://githubmemory.com/repo/tech-srl/code2vec/issues/94).
Can you please check and tell me in which way I should check or what's my issue?
Thanks & Regards, Tamal Mondal
UPDATE
I did check if the length of paths is the issue or if there are extra commas or spaces. It turned out that both these cases were there probably. When I took care of extra commas or spaces(verified in the final extracted data for extras), in the extracted data, the maximum length between any two terminals is 8 across the whole dataset and the data is in the format of "target_sequence subtoken1|subtoken2|subtoken3,intermediate_nodes(| separated),subtoken4|subtoken5|subtoken6......"
I am still getting similar errors, but this time I got it after quite some time of starting the training which probably means the issue s in some other datapoint. Also, I did try to run the training script 2 times with 9 and 51 as the MAX_PATH_LENGTH and using the same dataset. For the first case, it gave an error during the first epoch itself and for the second case, EPOCH 0 got completed but gave a similar error in the next epoch(not sure how as during the first epoch only, the whole training dataset should get used). Also as with MAX_PATH_LENGTH = 51, one epoch got finished, not sure why for 9 it's failing as I verified every path length with a script(and the maximum should be 8).
I have attached the training logs for both the 2 cases separately, please have a look.
code2seq training logs - 9 max length.txt code2seq training logs - 51 max length.txt
Thanks & Regards, Tamal Mondal
UPDATE
One more thing that I noticed is, in every run, the place of invalid argument error is changing even though the dataset is same. Here are some of the examples:
Run 1:
2022-06-12 07:18:28.701173: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[480] = [159,3] is out of bounds: need 0 <= index < [200,3]
Run 2:
2022-06-13 08:32:51.253112: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[477] = [158,3] is out of bounds: need 0 <= index < [200,3]
Run 3:
2022-06-13 08:45:52.382922: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[564] = [187,3] is out of bounds: need 0 <= index < [200,3]
Thanks & Regards, Tamal Mondal
Hi @Tamal-Mondal , Thank you for your interest in our work!
Since the error says index < [200,3]
, i suspect that you still have extra commas in either your sub tokens or paths.
Can you verify that? Uri
Thanks a lot, @urialon for the quick reply, I really appreciate that. Yes, there was a silly issue and some extra spaces were in the final processed data. After I fixed that, the model is training now.
Will get back to you if any other issues occur.
Regards, Tamal Mondal
Hello @urialon and @stasbel,
I am trying to deploy code2seq for code summarization task using our own python dataset. For this, I have used the steps mentioned in https://github.com/tech-srl/code2seq/tree/master/Python150kExtractor . I have made the necessary changes in the python extractor to parse our data and the final processed data seems to be correct visually. I am getting some internal error while trying to train the Code2Seq model by running the train_python150k.sh script.
I have attached the training logs below. It would be a great help if you can tell the problem or provide some lead.
code2seq training logs.txt
Thanks And Regards, Tamal Mondal