mkshiblu / jsdiffer

JavaScript refactoring detection tool between commits
MIT License
3 stars 0 forks source link

Investigate why the new processStatements() taking too much time on vue_runtime project #67

Closed mkshiblu closed 3 years ago

mkshiblu commented 3 years ago

Start of vue_runtime..

/*!
 * Vue.js v2.5.14
 * (c) 2014-2018 Evan You
 * Released under the MIT License.
 */
(function (global, factory) {
    typeof exports === 'object' && typeof module !== 'undefined' ? module.exports = factory() :
    typeof define === 'function' && define.amd ? define(factory) :
    (global.Vue = factory());
}(this, (function () { 'use strict';

/*  */

var emptyObject = Object.freeze({});

My assumption is that almost the whole code is written in a single statement here. Therefore all the replacement/matching are operation on this huge string of StatementObject. This could be also related to #49 which I am not still sure why it takes too much time on the child count score computation.

This should be the top priority now since I am expecting most of the javascript programs could be just similar to this especially for the Nodejs programs where codes are exported as a module.

mkshiblu commented 3 years ago

After double-checking it seems that the Levenshtein Distance's apply() of apache, used by String.editDistance() is causing the issue of the heavy slow speed of JSRMIner on statements with huge anonymous function declarations which is very common practice in java. For example, in node js, many of the functions are exported from the file (i.e. function declarations are assigned to an export variable)

As seen in the picture below, the apply method took 411s for just one invocation inside ReplacmentInfo's initialization. The argumentized strings are basically the whole file (~8KLOC) of more than 200K characters

argumentizedString1.length(): 215801 argumentizedString2.length(): 218234

image

Clearly, we cannot pass the whole program to the editDistance method. Since it's unlikely to have java programs to have similar KLOCs in a single statement so this problem is pretty much jsSpecific.

Testing on refactoring miner, also shows that the editDistance takes most times for a statement with 20-30 lines of anonymous function declaration code.

@tsantalis Any suggestion?

mkshiblu commented 3 years ago

For the solution, the issue has been moved to issue #75