Open carlosparadis opened 1 year ago
/exec
scripts.On the longer term, we would like to have `/exec scripts that offer CLI to Kaiaulu's API most used features so they can be used server-side. The most obvious one here are the downloaders in Kaiaulu being set on a CRON job. In the current architecture, most scripts will pull from download.R, and possibly from parser.R. In the new architecture, every script will only require to go after its respective module. For example, a script to download JIRA issues, only relies on jira.R.
In the current architecture, a db.R
file would need to be defined. The module would be similar to parser.R function, in that it would ingest from raw data, but rather than output a single table, it would instead generate several tables in a normalized manner. In the new architecture, the db_
functions would exist within every module. E.g. there would be a db_ function in jira.R, other in mail.R, etc. The logic of how to transform raw data for insertion in an optional database would be self contained to the tool it is in charge of.
Considering the parser.R module will continue to grow indefinitely in the current architecture, the db.R module would also suffer from the same issue. On the new architecture, however, this would be distributed across all the modules if the data was expected to be available on the database.
The testthat
package uses as convention test-
and the file names. Much like the parser.R
is getting bloated with parser functions, as consequence so does the test-parser.R
. Therefore, the creation of more unit tests in the future in comparing both architecture organizations would be more clean and sane to navigate on the unit tests if they were test-git.R
, test-jira.R
, test-mail.R
, test-bugzilla.R
, etc than all combined in test-parser.R
as it currently is.
Currently, the
download.R
,parser.R
,network.R
have been growing indefinitely as more downloaders, parsers and transformations to network are added. Moreover, some conflicting in semantics can already be identified. For instance, the dv8.R contains parser functions, since they pertain to dv8 interfacing, but could also be reasonably expected to be found in parser.R. We also have a git.R module, but its gitlog parser exists inparser.R
.I am considering refactoring the 3 files based on the overall type of data they parse, download and represent network as, and maintain the distinction on the function purpose on its function name prefix only, as it is already done (i.e. download*, parse*, and transform_*). The following describes the tentative new organization:
git.R
Functions that rely on the existence of a
.git
folder (not github).mail.R
All mailing list archives end up as .mbox files and do not offer much interface other than downloading its data.
Issue Trackers
Originally, I considered a
issue.R
would make more sense. However, thegithub.R
file is fairly extensive, and has seen at least three use cases in Kaiaulu: a) analysis of events, b) analysis of bug count, c) analysis of social smells. The JIRA API also offers the same level of access for analysis, and bugzilla offers at least the bugs and communication analysis potential. Therefore, I believeissue.R
is better reserved for functions that abstract these type of operations in a common interface, pulling from these more specialized modules to be future proof.bugzilla.R
Bugzilla offer a quite extensive API. Although we do not currently use it extensively, I believe it makes sense to separate
jira.R
Kaiaulu downloader for JIRA is currently an R package, so the Notebook provides guidance on how to utilize it for downloading data, but it is not defined as part of an API.
github.R
This already exists.
src.R
These should contain functions associated to the examination of source files, which do not depend on the existence of a
.git
folder.dv8.R
This already exists, but some parser.R functions should be moved to dv8.R
vulnerabilities.R
Functionality associated to the analysis of cve, cwe and capec
identity.R, graph.R, motif.R, gof.R, text.R
No changes to these modules. Their interfaces are fairly self contained.
metric.R
This already exists, but I believe smells.R would be better served being moved here.
interval.R
This file would likely be better renamed as
series.R
and could support time series analysis in Kaiaulu. Currently, this is not very clear on how could be done.example.R
This should be fine as its own file for now, although it can be misleading as example files are normally example code for users to try the tool. May need a better name for this in the future.
Homeless Functions
Currently, the
parse_reply
andtransform_reply_to_bipartite_network
function would be homeless, as it serves as an abstraction to mail.R and jira.R. A reply.R file may make sense for communication related API in Kaiaulu, or maybe this level of abstraction in the API should not exist and users should refer to the Notebook for it. I will give more thought. Another function is parse_java_code_refactoring_json.