sailuh / kaiaulu

An R package for mining software repositories
http://itm0.shidler.hawaii.edu/kaiaulu
Mozilla Public License 2.0
18 stars 12 forks source link

Adds GoF Module and Quality of Life Features for Combining with other Data #223

Closed carlosparadis closed 1 year ago

carlosparadis commented 1 year ago

Pattern4 can take several hours (~8 on iotdb current version at the time) to generate an XML file. The parser function which previously generated the XML and subsequently parsed formatted into R was therefore de-coupled into a writer and parser.

The generated XML file patterns, because pattern4 operates in bytecode, identify patterns per class, not per file. In addition, certain patterns are defined based on the variables and methods, which pattern4 also identifies. For analysis that are done for class or file metrics, the interest is in obtaining a table containing only the classes. Exchanges with the author of pattern4 clarified how to, using the role_name field, the classes, methods and variables can be identified. A new function encoding this process was added, subset_gof_class. The function docs detail the process. These functions are now located in R/gof.R.

Even after subsetting, however, the table contains classes. Much of Kaiaulu analysis is at file level (although class-level s also available). Moreover, the classes are identified based on the package declaration to the file they belong, i.e. their namespace. To map the namespace to filepath, a function which can analyze the source code of the project to provide a mapping from namespace to filepath is needed. This function is part of the text.R module, utilizing srcML to query the annotated source code for said information. The added function is query_src_text_namespace.

With both tables available, the output of pattern4 can then be mapped to filepaths, and subsequently integrated to other metrics.