Closed Avv22 closed 3 years ago
Hi Avra, Thank you for your interest in this work! Sorry again for the delayed response.
Yes, you will need to train the model from scratch for Python. See: https://github.com/tech-srl/code2vec#extending-to-other-languages
As for Java, you will either need to extract paths from your ASTs that are in the same format as our data. Otherwise, you can de-serialize your ASTs (convert them back to code), and run our JavaExtractor on the produced code.
Best, Uri
@urialon. Okay! So the above AST sample for one Python code does not work? I have to use your java extractor and astminer on my code samples to train them on code2vec please?
Correct.
Hello,
I have a bunch of Python ASTs and Java ASTs in the following format:
[{"id": 0, "type": "Module", "children": [1, 7, 19, 22, 38]}, {"id": 1, "type": "Assign", "children": [2, 3]}, {"id": 2, "type": "NameStore", "value": "S"}, {"id": 3, "type": "Call", "children": [4, 5]}, {"id": 4, "type": "NameLoad", "value": "list"}, {"id": 5, "type": "Call", "children": [6]}, {"id": 6, "type": "NameLoad", "value": "input"}, {"id": 7, "type": "Assign", "children": [8, 9]}, {"id": 8, "type": "NameStore", "value": "a"}, {"id": 9, "type": "Call", "children": [10, 11]}, {"id": 10, "type": "NameLoad", "value": "list"}, {"id": 11, "type": "Call", "children": [12, 13, 14]}, {"id": 12, "type": "NameLoad", "value": "map"}, {"id": 13, "type": "NameLoad", "value": "int"}, {"id": 14, "type": "Call", "children": [15]}, {"id": 15, "type": "AttributeLoad", "children": [16, 18]}, {"id": 16, "type": "Call", "children": [17]}, {"id": 17, "type": "NameLoad", "value": "input"}, {"id": 18, "type": "attr", "value": "split"}, {"id": 19, "type": "Assign", "children": [20, 21]}, {"id": 20, "type": "NameStore", "value": "factor"}, {"id": 21, "type": "Num", "value": "0"}, {"id": 22, "type": "For", "children": [23, 24, 25]}, {"id": 23, "type": "NameStore", "value": "tmp"}, {"id": 24, "type": "NameLoad", "value": "a"}, {"id": 25, "type": "body", "children": [26, 35]}, {"id": 26, "type": "Expr", "children": [27]}, {"id": 27, "type": "Call", "children": [28, 31, 34]}, {"id": 28, "type": "AttributeLoad", "children": [29, 30]}, {"id": 29, "type": "NameLoad", "value": "S"}, {"id": 30, "type": "attr", "value": "insert"}, {"id": 31, "type": "BinOpAdd", "children": [32, 33]}, {"id": 32, "type": "NameLoad", "value": "tmp"}, {"id": 33, "type": "NameLoad", "value": "factor"}, {"id": 34, "type": "Str", "value": "\\""}, {"id": 35, "type": "AugAssignAdd", "children": [36, 37]}, {"id": 36, "type": "NameStore", "value": "factor"}, {"id": 37, "type": "Num", "value": "1"}, {"id": 38, "type": "Expr", "children": [39]}, {"id": 39, "type": "Call", "children": [40, 41]}, {"id": 40, "type": "NameLoad", "value": "print"}, {"id": 41, "type": "Call", "children": [42, 45]}, {"id": 42, "type": "AttributeLoad", "children": [43, 44]}, {"id": 43, "type": "Str", "value": ""}, {"id": 44, "type": "attr", "value": "join"}, {"id": 45, "type": "NameLoad", "value": "S"}]'
How can I get their embeddings with your model please? Is their already trained model that I can used directly to output embeddings similar to your trained model for Java please or I should train the model from scratch for Python? If yes, can you please show how to start that?