tech-srl / code2seq

Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code"
http://code2seq.org
MIT License
548 stars 165 forks source link

Unable to get embeddings from the trained model for Java #109

Closed Avv22 closed 2 years ago

Avv22 commented 2 years ago

Hello,

I am trying to use your trained model to get predictions of a 20k Java dataset. I first tried to run your model based on documentation as follows:

(scikit-dev) osboxes@osboxes:~/project/code2seq-master$ python code2seq.py --load models/java-large-model/model_iter52.release --predict

I got the following output with no output:

/home/osboxes/miniconda3/envs/scikit-dev/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/osboxes/miniconda3/envs/scikit-dev/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/osboxes/miniconda3/envs/scikit-dev/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/osboxes/miniconda3/envs/scikit-dev/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/osboxes/miniconda3/envs/scikit-dev/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/osboxes/miniconda3/envs/scikit-dev/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) 2021-11-23 21:41:27.064062: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 2021-11-23 21:41:27.069036: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. Loading dictionaries from: models/java-large-model/model_iter52.release Done loading dictionaries Created model Evaluation reader cannot find file: WARNING:tensorflow:From /home/osboxes/miniconda3/envs/scikit-dev/lib/python3.6/site-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead.

Done loading model Serving Modify the file: "Input.java" and press any key when ready, or "q" / "exit" to exit

I am just looking to get one vector if possible to represent each of the 20k Java files using your trained model. Can you please help me how to do that as I am unable to train your models on my machine as I got memory errors in my other issue here. Thank you.

My dataset is a set of 20k java files, so I am not sure how to produce one embedding per file based on your code2seq model and whether this is possible or I should run your extractor and then do prediction. Not sure how to proceed. So I would like the following predictions please:

file1.java -> [prediction vector 1 from your trained model] file2.java -> [prediction vector 2 from your trained model] . . . file20000.java -> [prediction vector 20000 from your trained model]

Each predicted vector should be of the same size. Can I do that please? Your help is appreciated.

Problem: how to pass a dataset for a released model and get predictions please for each file in our dataset? I see the prediction is simply a method name, but can we get context vector representing the whole file one for each of our 20k files in our dataset please?