FileNotFoundError: [Errno 2] No such file or directory: '../emse-data/vocab.sbt'

aaajeong commented 2 years ago

Hi :) I'm a student studying NLP, especially neural machine translation. So I read your DeepCom thesis, it was very interesting task. I tried to execute your EMSE-DeepCom code, but there are some difficulties.

When I execute train.py, there are FileNotFoundError such as emse-data/vocab, test, train etc... I made test, train.sbt file using ast_traversal.py but I can't vocab.sbt file only.

How to make it? Should I make a vocab.json file?

This is error description. Thank you 😀

(deepcom) gpuadmin@gpuadmin:~/ahjeong/EMSE-DeepCom/source code$ python3 main.py config.yaml --train -v 02/17 13:52:57 label: default 02/17 13:52:57 description: default configuration next line of description last line 02/17 13:52:57 main.py config.yaml --train -v 02/17 13:52:57 commit hash 2f4e87361fa6e0cdc119112690b0a17db0a95a0d 02/17 13:52:57 tensorflow version: 1.10.0 02/17 13:52:57 program arguments 02/17 13:52:57 aggregation_method 'sum' 02/17 13:52:57 align_encoder_id 0 02/17 13:52:57 allow_growth True 02/17 13:52:57 attention_type 'global' 02/17 13:52:57 attn_filter_length 0 02/17 13:52:57 attn_filters 0 02/17 13:52:57 attn_prev_word False 02/17 13:52:57 attn_size 128 02/17 13:52:57 attn_temperature 1.0 02/17 13:52:57 attn_window_size 0 02/17 13:52:57 average False 02/17 13:52:57 batch_mode 'standard' 02/17 13:52:57 batch_size 64 02/17 13:52:57 beam_size 5 02/17 13:52:57 bidir False 02/17 13:52:57 bidir_projection False 02/17 13:52:57 binary False 02/17 13:52:57 cell_size 256 02/17 13:52:57 cell_type 'GRU' 02/17 13:52:57 character_level False 02/17 13:52:57 checkpoints [] 02/17 13:52:57 conditional_rnn False 02/17 13:52:57 config 'config.yaml' 02/17 13:52:57 convolutions None 02/17 13:52:57 data_dir '../emse-data' 02/17 13:52:57 debug False 02/17 13:52:57 decay_after_n_epoch 1 02/17 13:52:57 decay_every_n_epoch 1 02/17 13:52:57 decay_if_no_progress None 02/17 13:52:57 decoders [{'max_len': 30, 'name': 'nl'}] 02/17 13:52:57 description 'default configuration\nnext line of description\nlast line\n' 02/17 13:52:57 dev_prefix 'test' 02/17 13:52:57 early_stopping True 02/17 13:52:57 embedding_size 256 02/17 13:52:57 embeddings_on_cpu True 02/17 13:52:57 encoders [{'attention_type': 'global', 'max_len': 200, 'name': 'code'}, {'attention_type': 'global', 'max_len': 500, 'name': 'sbt'}] 02/17 13:52:57 ensemble False 02/17 13:52:57 eval_burn_in 0 02/17 13:52:57 feed_previous 0.0 02/17 13:52:57 final_state 'last' 02/17 13:52:57 freeze_variables [] 02/17 13:52:57 generate_first True 02/17 13:52:57 gpu_id 6 02/17 13:52:57 highway_layers 0 02/17 13:52:57 initial_state_dropout 0.0 02/17 13:52:57 initializer None 02/17 13:52:57 input_layer_dropout 0.0 02/17 13:52:57 input_layers None 02/17 13:52:57 keep_best 5 02/17 13:52:57 keep_every_n_hours 0 02/17 13:52:57 label 'default' 02/17 13:52:57 layer_norm False 02/17 13:52:57 layers 1 02/17 13:52:57 learning_rate 0.5 02/17 13:52:57 learning_rate_decay_factor 0.95 02/17 13:52:57 len_normalization 1.0 02/17 13:52:57 log_file 'log.txt' 02/17 13:52:57 loss_function 'xent' 02/17 13:52:57 max_dev_size 0 02/17 13:52:57 max_epochs 100 02/17 13:52:57 max_gradient_norm 5.0 02/17 13:52:57 max_len 50 02/17 13:52:57 max_steps 600000 02/17 13:52:57 max_test_size 0 02/17 13:52:57 max_to_keep 1 02/17 13:52:57 max_train_size 0 02/17 13:52:57 maxout_stride None 02/17 13:52:57 mem_fraction 1.0 02/17 13:52:57 min_learning_rate 1e-06 02/17 13:52:57 model_dir '../emse-data/model/hybrid' 02/17 13:52:57 moving_average None 02/17 13:52:57 no_gpu False 02/17 13:52:57 optimizer 'sgd' 02/17 13:52:57 orthogonal_init False 02/17 13:52:57 output None 02/17 13:52:57 output_dropout 0.0 02/17 13:52:57 parallel_iterations 16 02/17 13:52:57 pervasive_dropout False 02/17 13:52:57 pooling_avg True 02/17 13:52:57 post_process_script None 02/17 13:52:57 pred_deep_layer False 02/17 13:52:57 pred_edits False 02/17 13:52:57 pred_embed_proj True 02/17 13:52:57 pred_maxout_layer True 02/17 13:52:57 purge False 02/17 13:52:57 raw_output False 02/17 13:52:57 read_ahead 1 02/17 13:52:57 remove_unk False 02/17 13:52:57 reverse_input True 02/17 13:52:57 rnn_feed_attn True 02/17 13:52:57 rnn_input_dropout 0.0 02/17 13:52:57 rnn_output_dropout 0.0 02/17 13:52:57 rnn_state_dropout 0.0 02/17 13:52:57 save False 02/17 13:52:57 score_function 'nltk_sentence_bleu' 02/17 13:52:57 script_dir 'scripts' 02/17 13:52:57 sgd_after_n_epoch None 02/17 13:52:57 sgd_learning_rate 1.0 02/17 13:52:57 shuffle True 02/17 13:52:57 softmax_temperature 1.0 02/17 13:52:57 steps_per_checkpoint 2000 02/17 13:52:57 steps_per_eval 2000 02/17 13:52:57 swap_memory True 02/17 13:52:57 tie_embeddings False 02/17 13:52:57 time_pooling None 02/17 13:52:57 train True 02/17 13:52:57 train_initial_states True 02/17 13:52:57 train_prefix 'train' 02/17 13:52:57 truncate_lines True 02/17 13:52:57 update_first False 02/17 13:52:57 use_dropout False 02/17 13:52:57 use_lstm_full_state False 02/17 13:52:57 use_previous_word True 02/17 13:52:57 verbose True 02/17 13:52:57 vocab_prefix 'vocab' 02/17 13:52:57 weight_scale None 02/17 13:52:57 word_dropout 0.0 02/17 13:52:57 python random seed: 8107473215777315132 02/17 13:52:57 tf random seed: 2144161570693003299 02/17 13:52:57 creating model 02/17 13:52:57 using device: /gpu:6 02/17 13:52:57 copying vocab to ../emse-data/model/hybrid/data/vocab.sbt Traceback (most recent call last): File "main.py", line 329, in main() File "main.py", line 279, in main model = TranslationModel(config) File "/home/gpuadmin/ahjeong/EMSE-DeepCom/source code/translation_model.py", line 63, in init ref_ext=ref_ext, binary=self.binary, kwargs) File "/home/gpuadmin/ahjeong/EMSE-DeepCom/source code/utils.py", line 225, in get_filenames shutil.copy(src, dest) File "/home/gpuadmin/anaconda3/envs/deepcom/lib/python3.5/shutil.py", line 241, in copy copyfile(src, dst, follow_symlinks=follow_symlinks) File "/home/gpuadmin/anaconda3/envs/deepcom/lib/python3.5/shutil.py", line 120, in copyfile with open(src, 'rb') as fsrc: FileNotFoundError: [Errno 2] No such file or directory: '../emse-data/vocab.sbt'

dingas0987 commented 2 years ago

Hi, I'm not the author but I was able to get the model to work by doing a few things:

In the config.yaml file, change "name: sbt" to "name: ast"
Changing line 183 to this
Changing lines 370 and 373 to this
I suggest changing to this on line 563 so the score written to file is from what you see from the output otherwise it will only write 0.17 instead of 0.1729 from the command line

I think that was all I did to get this to work, I hope this helped. :)

aaajeong commented 2 years ago

@dingas0987
Hello :) I'm trying to run those code. Thanks for your help, I can see the log sentence "starting training..." so I'm waiting to finish training part. I hope I do well :)

Thank you very much!

dingas0987 commented 2 years ago

I forgot to say that if training takes a really long time (in my experience 2+ hours), try changing "gpu_id" to 0 in config.yaml (assuming you have an NVIDIA GPU). It takes me less than 1 hour to get a score.

aaajeong commented 2 years ago

@dingas0987 Thank you!

jiayu1011 commented 2 years ago

@dingas0987 Hello :) I'm trying to run those code. Thanks for your help, I can see the log sentence "starting training..." so I'm waiting to finish training part. I hope I do well :)

Thank you very much!

Hello~I'm also a student learning Code Summarization now. I follow this issue's instructions and then take my step to "start training.....", but after several hours i got no response.I wonder whether u have got a pleasant result? By the way,i have downloaded the data from author's link on Google Drive.I set the data in the right position due to the source code....

dingas0987 commented 2 years ago

@jiayu1011 Hi, I see based on your screenshot that you aren't using an environment for this project and suspect that you might not be using the correct settings. Try running the project in an environment install the packages and version of Python that I used.

tensorflow-gpu==1.13.1
python=3.7.9
pyyaml
nltk

jiayu1011 commented 2 years ago

@jiayu1011 Hi, I see based on your screenshot that you aren't using an environment for this project and suspect that you might not be using the correct settings. Try running the project in an environment install the packages and version of Python that I used.
tensorflow-gpu==1.13.1
python=3.7.9
pyyaml
nltk

Anyway, thank for ur help firstly. Actually my senior（in my laboratory) taught me to find the right tensorflow-gpu version.The paper was published in May,2018. So I got to find those exact versions before May,2018.Here is the mapping of tensorflow version with timeline. Here is my enviorment...(Since my lab's server provides a CUDA 11's enviornment that's not suitable to old version, u need to install cudatoolkit==9.0 which is corresponded with tensorflow-gpu==1.13.1)

python==3.5.4
numpy
tensorflow-gpu==1.13.1
cudatoolkit==9.0
pyyaml
nltk
javalang

Without cudatoolkit using, u won't be able to use gpu for trainning.I successfully got to run trainning code now:D.

VishalPatel43 commented 1 year ago

we are trying to run this source code and after training the model it will show the finished model but the output file not getting generated, we need it for evaluation, it will take two file references and prediction.

dingas0987 commented 1 year ago

@VishalPatel43 The comments that the model generates is located at 'data/model/hybrid/eval' The reference is located at 'data/data_RQ1/test/test.token.nl'

Hopefully this is what you are looking for, I haven't done this in over a year and don't really have the files anymore.

VishalPatel43 commented 1 year ago

our model is not generating emse-model/hybrid/eval. We don't know what to do. it only generates the emse-data/model/hybrid/data folder we need the output file so we can evaluate the result with the prediction (output file) and test.token.nl (which we have already).

Should I change it? max_steps: 600000 # maximum number of updates before stopping max_epochs: 100 # maximum number of epochs before stopping

If possible can share your code with instructions so we can train the model? It would really help me in my research work

my email: 202211012@daiict.ac.in Vishal Patel

VishalPatel43 commented 1 year ago

can you help us? we tried but the output file not getting generated after the trained the model. what should I do?

it's a log file

xing-hu / EMSE-DeepCom

FileNotFoundError: [Errno 2] No such file or directory: '../emse-data/vocab.sbt' #29