src-d / blog

source{d} blog
https://blog.sourced.tech/
GNU General Public License v3.0
27 stars 41 forks source link

[PROPOSAL] MSR Paper Review: CCFinder & Cross-language clone detection by learning over abstract syntax trees #306

Open m09 opened 5 years ago

m09 commented 5 years ago
bzz commented 5 years ago

Will be happy to help and take care of this one, original suggestion comes from this meeting.

Here is the preliminary blog post plan, it's not a usual paper-review any more, but rather an overview that touches up on number of relevant work (presented at MSR)

This field pioneered by
Katsuro Inoue
  "CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code" 2002 
  https://www.semanticscholar.org/paper/CCFinder%3A-A-Multilinguistic-Token-Based-Code-Clone-Kamiya-Kusumoto/98e810ed098a651e0ba8cbb63d2d926d4eebdf9b
  http://www.ccfinder.net/ccfinderxos.html and MSR co-founder

Particulary important for ML on code now because of
Miltos Allamanis
  "The Adverse Effects of Code Duplication in Machine Learning Models of Code"
  https://arxiv.org/abs/1812.06469

Modern methods are cross-language and incorporate structural information from AST
Daniel Perez
  "Cross-language clone detection by learning over abstract syntax trees"
  https://static.perez.sh/research/2019/cross-language-clone-detection/clone-detection-msr19.pdf

To scale it to a large codebases source{d} built Gemini
  (paper pending)

@m09 @vmarkovtsev @warenlg please let me know what you guys think about the structure.

m09 commented 5 years ago

The plan looks great to me. Let me know if you want a review at any point when you start working on this!

vcoisne commented 5 years ago

@bzz Trying to get visibility into overall content calendar. When do you think you'll be able to write this one ?

bzz commented 5 years ago

I'm sorry for the delay cause by vacations, @vcoisne.

As one of the goals for this one is to have a brief blog post (not a long one) - I'll try to post a draft by the end of next week.

vcoisne commented 4 years ago

@bzz ping :)

bzz commented 4 years ago

and of course I did not manage to find the time though the retreat week :/ Sorry about misleading communication.

I will be on vacation and then AFK for a while and shall be able to get back to this first thing on Oct 14th.