Closed marnovo closed 5 years ago
link to the paper: https://www.overleaf.com/13635304krfvjbpvgwfd git link: https://git.overleaf.com/13635304krfvjbpvgwfd (do not know how it works. Will check)
What do we want to show? We want to show that these embeddings capture some structural/linguistic/etc information from code. We anyway should compare our embedding with others. They can be.
Some of this experiments can be excluded from paper. But we will understand what can be captured by this approach and what can not.
Can use Perplexity but first I need to understand how to cook it.
For each experiment, it is good to understand why it works or not and explain it in the paper.
I keep this list updated to add new ideas to this list.
https://arxiv.org/pdf/1602.02215.pdf
http://homepages.inf.ed.ac.uk/csutton/publications/accurate-method-and-class.pdf
get
, 'is', "set") and some name keywords (for getPersistentManifoldPool
guess that Manifold
likely to be included).https://dl.acm.org/citation.cfm?id=3097421 https://sci-hub.tw/https://dl.acm.org/citation.cfm?id=3097421
https://arxiv.org/pdf/1409.3358.pdf
https://arxiv.org/pdf/1409.5718.pdf
http://near.ai/articles/2017-06-01-Code-Completion-Demo/
http://proceedings.mlr.press/v37/piech15.pdf
https://arxiv.org/pdf/1704.00135.pdf it is our work. Can mine links from here.
http://homepages.inf.ed.ac.uk/csutton/publications/naturalize.pdf
https://pdfs.semanticscholar.org/41f6/daaa88c8c80228ac347a46bff8a6635d72d3.pdf
They use embeddings as part of the framework. Super short unrelaible. https://arxiv.org/pdf/1710.03129.pdf#page=31
https://arxiv.org/pdf/1705.09231.pdf skip it, but it is just interesting
http://allegro.mit.edu/pubs/posted/journal/2003-barron-chen-wornell-it.pdf
http://sci-hub.tw/https://dl.acm.org/citation.cfm?id=1029007
@vmarkovtsev in the end is this a project that is being worked on in Q1 or pending for Q2?
The deadline is April 15th so the answer is both.
@zurk is there a preliminary title or abstract, based on the papers you mention all I can figure out is that this related to embeddings of identifiers? Is this a paper on id2vec?
@eiso, abstract not ready yet. You can find a link to article plan in first comment https://github.com/src-d/backlog/issues/1166#issuecomment-363457661. And yes, it is about id2vec.
Just wanted to add a note here that the main co-chair of this conference is Prodo.ai, they consider us their main competitors and might be biased in the review. Just FYI.
It looks like in the literature, the paper/results from Sutton group http://groups.inf.ed.ac.uk/cup/naturalize highlighted above "Suggesting Accurate Method and Class Names" might be the most comparable to id2vec work.
It showcases something that later became VarNaming
task (variable name predictions) and is generally referred as The Work on "learning distributed representations of variables using all their usages to predict their names".
"Deep Learning Similarities from Different Representations of Source Code"
focused on clone detection task but might be relevant as well https://2018.msrconf.org/event/msr-2018-papers-deep-learning-similarities-from-different-representations-of-source-code as it compares different representations of the source code, including section 3.2.1 Identifiers
and 3.2.2 AST
thank you, @bzz! Good links and material for introduction part of the paper.
I will attend the meetup and talk about swivel and id2vec. https://github.com/src-d/backlog/issues/1272. The paper itself is pending for now.
So this never happened. We submitted the paper about identifier splitting to ML4P, which was accepted, and we presented it in person with @warenlg The id2vec paper is still fun to write, though we need to finish some old legacy first.
Story: "As a source{d} engineer I want to spread the reach of our work and its influence over key influencers in the academic area of source code analysis/machine learning on source code so that we are taken more seriously by the community and become the standard for datasets/tools on this field."
Topics and format at: http://ml4p.org/
Submission site: https://easychair.org/conferences/?conf=mlp2018
Deadlines:
Tasks: Given the dataset (which is enough for testing but not enough for reliable research),