Closed celsofranssa closed 4 years ago
Hi @Ceceu , Thank you for your interest in code2vec!
Yes,
test|field|injection|with|map
is the method name, and then each triple is a single path. So for example,
-400155226
is the hash of a path of nodes that connect the values bf
and testbean
.
See also the description here: https://github.com/tech-srl/code2vec#extending-to-other-languages
Best, Uri
Hello @urialon
I am very interested in code2vec
, it's a really great step forward to code understanding.
I ended up interpreting the dataset format as you answered even though I didn't pay attention to the part of the README
that you indicated (for my mistake).
So, during code2vec
training is the path-context
represented by this hash value instead of the path itself?
Yes, In code2vec, the hash is just to save space, because the entire path is treated as a single symbol. So it doesn't matter if we use the hash or the path itself.
In code2seq, we do not hash, because the model reads the path as a sequence of AST nodes.
I hope it helps, Uri
Yes, In code2vec, the hash is just to save space, because the entire path is treated as a single symbol. So it doesn't matter if we use the hash or the path itself.
In code2seq, we do not hash, because the model reads the path as a sequence of AST nodes.
I hope it helps, Uri
Yes it helped a lot, thank you very much.
I am starting to work with
code2vec
and wondering about the dataset format.In the following instance:
test|field|injection|with|map bf,-400155226,testbean size,-1639730666,assertequals ...
I imagine thattest|field|injection|with|map
is the splitted name of the method.Then, what is the sequence of triples that comes right after the name of the methods?