jinfenglin / TraceBERT

19 stars 9 forks source link

Problem with training #7

Open hanajusufovic7043 opened 1 year ago

hanajusufovic7043 commented 1 year ago

Hello, I have tried to train and evaluate this model, but I encountered the problem with training part. I am sending you the error.

Hopefully, you will be able to help me. Thank you in advance!

Epoch: 0% 0/8 [00:00<?, ?it/s] Traceback (most recent call last): File "/content/NLP/code_search/siamese2/siamese2_train.py", line 23, in main() File "/content/NLP/code_search/siamese2/siamese2_train.py", line 18, in main train(args, train_examples, valid_examples, model, train_iter_method=train_with_neg_sampling) File "/content/NLP/code_search/siamese2/../../code_search/twin/twin_train.py", line 282, in train train_iter_method(*params) File "/content/NLP/code_search/siamese2/../../code_search/twin/twin_train.py", line 73, in train_with_neg_sampling train_dataloader = train_examples.random_neg_sampling_dataloader(batch_size=batch_size) File "/content/NLP/code_search/siamese2/../../common/data_structures.py", line 294, in random_neg_sampling_dataloader sampler = RandomSampler(pos + neg) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 107, in init raise ValueError("num_samples should be a positive integer " ValueError: num_samples should be a positive integer value, but got num_samples=0 Steps: 0it [00:00, ?it/s]

jinfenglin commented 1 year ago

The error log seems to suggest the the training data is empty, would you please add print to show the size of train/valid/test? If you could not locate the issue after checking the data loading step, would you share the script and dataset you used for training the model?

hanajusufovic7043 commented 1 year ago

Hi, thank you for your response. Yes, the problem was because of the empty dataset. I managed to change the path of the dataset after which are training process began. Thank you for your support!

raftaria commented 3 months ago

Hi, jinfenglin. I am trying to train TraceBert as described in your paper. When I read the README.md file and ran it, I get the same error as mentioned above. I would be grateful if you could check the attached image if the data is misplaced as mentioned in your reply.

And I can't access the google drive where the trained model is stored because I don't have authorization. And I tried several commands to get the CodeSearchNet data, but I didn't get it with 403 error, so I want to try with the data in Kaggle, and I wonder if this data is the same as the data mentioned in CodeSearchNet.

error 화면 캡처 2024-06-10 224034 화면 캡처 2024-06-10 223950