When Kaiaulu started being written, parser_* functions tried to preserve the original tool column names, so it was easier for someone wanting to know more about the columns to just see the tools documentation for their definition. Now that Kaiaulu has multiple tools interface, where some may even collect the same data with different definitions, this becomes more complicated. In addition, some column names are just not clear or do not follow SE literature convention (e.g. the SCC tool calls the metric LOC as lines).
Because of that, we should come up with a consistent nomenclature for data we care about. Eventually, I hope this can be documented on a database schema such as a .mwb with all the relationships, but Kaiaulu GitHub wiki should be helpful to iterate on suggestions before moving changes to the API and Notebooks.
Also, the following suggestions were requested from @rnkazman on e-mail titled [SEWORLD] CFP: Information and Software Technology Special Issue on Application of causal modeling and inference methods in software engineering: Approaches, Challenges, State-of-the-Art and Prospects as a initial step towards the goal of this issue:
SCC
code -> loc
I will create a wiki page containing other column names obtained so we can have a final decision on what the other columns for SCC should be called hereafter too.
Motif Metrics
anti_motif_square -> anti_square_motif
Outcome Metrics
file_bug_frequency -> file_bug_changes (considering calling it instead file_bug_commit_count)
file_non_bug_frequency -> file_nonbug_changes (considering calling it instead file_non_bug_commit_count)
No changes were requested for these two, but for consistency to the renaming above I was considering the following:
All the 4 metrics above involving bug and non bug are using a table of commits. What differs among them is if a) the commit links to a bug or not, and b) how we aggregate the commits, either by counting the number of rows, or sum of the churn column. Hence, I few the names explicitly including the word commit and the aggregator statistic (i.e. count or churn) makes it more explicit.
The docs should also make more clear the behavior of the code is to only to consider the counts of issues that contain an issue-id label. I.e. the function docs should more explicitly state this line of code in words on the function docs:
When Kaiaulu started being written,
parser_*
functions tried to preserve the original tool column names, so it was easier for someone wanting to know more about the columns to just see the tools documentation for their definition. Now that Kaiaulu has multiple tools interface, where some may even collect the same data with different definitions, this becomes more complicated. In addition, some column names are just not clear or do not follow SE literature convention (e.g. the SCC tool calls the metricLOC
aslines
).Because of that, we should come up with a consistent nomenclature for data we care about. Eventually, I hope this can be documented on a database schema such as a .mwb with all the relationships, but Kaiaulu GitHub wiki should be helpful to iterate on suggestions before moving changes to the API and Notebooks.
Also, the following suggestions were requested from @rnkazman on e-mail titled
[SEWORLD] CFP: Information and Software Technology Special Issue on Application of causal modeling and inference methods in software engineering: Approaches, Challenges, State-of-the-Art and Prospects
as a initial step towards the goal of this issue:SCC
code
->loc
I will create a wiki page containing other column names obtained so we can have a final decision on what the other columns for SCC should be called hereafter too.
Motif Metrics
anti_motif_square
->anti_square_motif
Outcome Metrics
file_bug_frequency
->file_bug_changes
(considering calling it insteadfile_bug_commit_count
)file_non_bug_frequency
->file_nonbug_changes
(considering calling it insteadfile_non_bug_commit_count
)No changes were requested for these two, but for consistency to the renaming above I was considering the following:
file_bug_churn
->file_bug_commit_churn
file_non_bug_churn
->file_non_bug_commit_churn
.All the 4 metrics above involving bug and non bug are using a table of commits. What differs among them is if a) the commit links to a bug or not, and b) how we aggregate the commits, either by counting the number of rows, or sum of the churn column. Hence, I few the names explicitly including the word
commit
and the aggregator statistic (i.e.count
orchurn
) makes it more explicit.The docs should also make more clear the behavior of the code is to only to consider the counts of issues that contain an issue-id label. I.e. the function docs should more explicitly state this line of code in words on the function docs:
https://github.com/sailuh/kaiaulu/blob/7566f4ef50a0cd55eff47eeade3d12f186d143f0/R/metric.R#L87
Afterall, we can only determine if a commit is or is not a bug if we know what issue the commit refers to. If we don't, then we filter out.