Closed mkshiblu closed 3 years ago
After double-checking it seems that the Levenshtein Distance's apply() of apache, used by String.editDistance() is causing the issue of the heavy slow speed of JSRMIner on statements with huge anonymous function declarations which is very common practice in java. For example, in node js, many of the functions are exported from the file (i.e. function declarations are assigned to an export variable)
As seen in the picture below, the apply method took 411s for just one invocation inside ReplacmentInfo's initialization. The argumentized strings are basically the whole file (~8KLOC) of more than 200K characters
argumentizedString1.length(): 215801 argumentizedString2.length(): 218234
Clearly, we cannot pass the whole program to the editDistance method. Since it's unlikely to have java programs to have similar KLOCs in a single statement so this problem is pretty much jsSpecific.
Testing on refactoring miner, also shows that the editDistance takes most times for a statement with 20-30 lines of anonymous function declaration code.
@tsantalis Any suggestion?
For the solution, the issue has been moved to issue #75
Start of vue_runtime..
The processStatements() is being called after createBodyMappers(), extract / inline methods etc. to match the statements that are directly inside a file.
Without calling this, the processing takes about 1.2minutes.
With a call to processStatements() it takes over 30 minutes and still, it does not finish.
The file is approximately 8000 lines of code.
ProcessStatements seems to be working correctly for toy projects
My assumption is that almost the whole code is written in a single statement here. Therefore all the replacement/matching are operation on this huge string of StatementObject. This could be also related to #49 which I am not still sure why it takes too much time on the child count score computation.
This should be the top priority now since I am expecting most of the javascript programs could be just similar to this especially for the Nodejs programs where codes are exported as a module.