sailuh / kaiaulu

An R package for mining software repositories
http://itm0.shidler.hawaii.edu/kaiaulu
Mozilla Public License 2.0
19 stars 12 forks source link

Extend Data Schema #319

Open crepesAlot opened 5 days ago

crepesAlot commented 5 days ago

Purpose


Relationships between different notebooks can be difficult to see or understand, so by creating tables, we can more easily visualize these relationships and better understand them. This issue is similar to that of issue #226.

Process


Using the /exec scripts for each of the notebooks made from issue #322, create a model in a MYSQL workbench which displays the tables and relationships.

Task List


carlosparadis commented 5 days ago

@crepesAlot I think the tasks in M2 got mixed up here. Let me refine your understanding. First, let's consider what you have accomplished in M1:

The conf.R module (make a new issue for this)

The conf.R is now our "dataset headquarters". It bridges the project configuration files of conf/ to the notebooks/ in vignettes by the use of get functions. This moves the responsibility of "how to access the config file information" from each notebook, and centralizes in conf.R.

There is, however, another responsibility that conf.R should take away from notebooks (or the user): That is of creating the folder directory Kaiaulu recommends. That's the first step.

The folder organization you created lies here: https://github.com/sailuh/kaiaulu/issues/230#issuecomment-2401210035 we will in essence make that representation being auto generated for a project. The user types the path to a empty projet folder as input, and all the folders are created within it (which is a series of mkdir function calls). You will want to make a little function for mkdir, see io.R, as it may already exist.

Going beyond folders and filepaths (please make a new issue for this and reference this comment)

The second code task here is that we want to add more /exec scripts. Much like @daomcgill created one for mail.R to refresh emails (dao if you did not yet, take note on your issue to do so). The /exec folder is how users who do not want to code in R, and just want specific kind of data out, can just call on the command line. This begs the question of: "OK but what table does the users want?" You should ask that question *per notebook and discuss that with me on the new issue. Try to make an informed guess of the notebook. As to how you construct the /exec, @daomcgill can cover on one of your internal meetings.

Extending the data schema per notebook (we will use THIS issue for this purpose)

Kaiaulu notebooks generate tables, as so will the exec/ scripts. Kaiaulu Notebooks do not exist in a vacuum, and there are relationships between the tables Kaiaulu generates. However, if you are just getting started in SE research, it will be very hard to see these relationships. Making this more explicit, is the goal of this issue. In other words, what is the data model underlying Kaiaulu?

One example here is your commit messages: If you notice, you add the issue id to the commit messages in Kaiaulu. The master branch always contains them. This means we could imagine a table that parses git log, and a table that parses github issues could be inner joined. This is, in fact, the case. Making this more explicit is the goal.

This is what ties this issue to #226. Some of the data model is already there, but it is not finalized. As you go through the notebooks to consider what /exec scripts we should create, and what tables they will output, we will also ask ourselves where the table fits on the data schema. We will not try to model every table in the notebook, just the ones that /exec outputs, and how they link to one another. You should start a new model from scratch in mysql workbench, since the data schema that exists does not follow this rule, and tries to model too much. You can use it as reference, but we will need to be selective.

Tl;DR

You have 1 out of 3 issues created. Please create the other two and update their specifications to the sections above. Once you do, assign to yourself the issue, and I will group all your issues into milestones later on.