sailuh / kaiaulu

An R package for mining software repositories

http://itm0.shidler.hawaii.edu/kaiaulu

Mozilla Public License 2.0

18 stars 12 forks source link

DV8 Integration Milestone Issue + Cheat Sheet #165

Closed leilani-reich closed 1 year ago

leilani-reich commented 1 year ago

We will integrate the low-level function of the tool DV8 by Archdia into Kaiaulu.

To do so, we will write wrapper functions for DV8's command line interface (CLI) functions. These functions involve locating anti-patterns, constructing design structure matrices (DSM), detecting hotspots, analyzing roots, quantifying debt, analyzing return on investment analysis, and more.

We will focus on a subset of functions for Kaiaulu, especially ones that can be chained together. In particular, we will focus on creating wrapper functions for architecture flaws and the decoupling level metric.

For each function, we will create a unit test using the testthat package for R.

Finally, we will create at least two R notebooks (one for architecture flaws, one for the decoupling level metric) to showcase the wrapper functions and show step-by-step how a user would conduct a software analysis using these functions.

TASKS:

Pull Request: https://github.com/sailuh/kaiaulu/pull/171

Leilani Reich Milestone 3.3:

[x] dv8_clsxb_to_clsxj() (core:export-cluster)
[x] parse_dependencies() changes ~dv8_depends_to_sdsmj()~ (call Depends command for sdsm.json)
[x] parse_dv8_clusters()

Milestone 3.4:

[x] dependencies_to_sdsmj()
[x] gitlog_to_hdsmj()

Nicholas Lee Milestone 3.3:

[x] dv8_gitlog_to_gitnumstat() (performs gitlog –numstat –date=iso)
[x] dv8_gitnumstat_to_hdsmb() (scm:history:gittxt:convert-matrix)
[x] dv8_hsdsmb_to_decoupling_level() (metrics:decoupling-level)
[x] dv8_hsdsmb_to_hierclsxb() (dr-hier:dr-hier)
[x] dv8_hsdsmb_drhier_to_excel() (office:export-spreadsheet)
[x] dv8_dsmj_to_dsmb() (core:export-matrix)
[x] parse_dv8_metrics_decoupling_level()

Malia Liu Milestone 3.3:

[x] dv8_hsdsmb_to_flaws() (arch-issue:arch-issue)
[x] dv8_dsmj_to_dsmb() (core:convert-matrix)
[x] parse_dv8_architectural_flaws()

Milestone 3.4:

[x] dv8_hsdsmb_sdsmb_to_hsdsmb() (core:merge-matrix)

Milestone 3.5:

[x] dv8_clsxj_to_clsxb (core:convert-cluster)

Everyone

[x] DV8 Cheatsheet

dv8_kaiaulu_api_v3_task_assignments

caiyuanfang commented 1 year ago

I am attaching the CLI manual here. Please let us know if you have any questions.

BR/Yuanfang

On Wed, Mar 8, 2023 at 12:46 AM Leilani Reich @.***> wrote:

We will integrate the low-level function of the tool DV8 by Archdia https://archdia.com/ into Kaiaulu.

To do so, we will write wrapper functions for DV8's command line interface (CLI) functions. These functions involve locating anti-patterns, constructing design structure matrices (DSM), detecting hotspots, analyzing roots, quantifying debt, analyzing return on investment analysis, and more.

For each function, we will create a unit test using the testthat package for R.

Finally, we will create an R notebook to showcase all of the wrapper functions and show step-by-step how a user would conduct a software analysis using these functions.

— Reply to this email directly, view it on GitHub https://github.com/sailuh/kaiaulu/issues/165, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBC4YDHCKTH75GXIKRQFKLW3AMLNANCNFSM6AAAAAAVTLHSIQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

leilani-reich commented 1 year ago

Hi, I don't see the manual. Is it supposed to be the one below? Thanks. DV8 2.0 Build User Guide.pdf

caiyuanfang commented 1 year ago

Yes it is

On Wed, Mar 8, 2023, 1:30 PM Leilani Reich @.***> wrote:

Hi, I don't see the manual. Is it supposed to be the one below? Thanks. DV8 2.0 Build User Guide.pdf https://github.com/sailuh/kaiaulu/files/10923747/DV8.2.0.Build.User.Guide.pdf

— Reply to this email directly, view it on GitHub https://github.com/sailuh/kaiaulu/issues/165#issuecomment-1460661631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBC4YEJ3ES5AOYKKZHHMIDW3DF3BANCNFSM6AAAAAAVTLHSIQ . You are receiving this because you commented.Message ID: @.***>

leilani-reich commented 1 year ago

Hi Carlos,

To start the process of DV8 integration, do you think it would be appropriate to create wrappers for the git commands? Since the git logs are often what the user starts with and what can get converted to a dependency matrix (*.dv8-dsm) or other format and then analyzed by DV8. So they are one of the starting points of the process of chaining functions.

Here's what the commands look like, referencing the DV8 2.0 Build User Guide:

git commands:

gittxt-convert-matrix, scm:history:gittxt:convert-matrix
- Convert a Git log file (in txt format)—which can be exported using the git command: git log --numstat --date=iso—to a dependency matrix file.
gittxt-generate-changelist, scm:history:gittxt:generate-changelist
- Generate a CSV-formatted change list file from a git log (git log --numstat --date=iso).
gittxt-generate-targetlist, scm:history:gittxt:generate-targetlist
- Generate a CSV-formatted target list file from a regular expression (used to match issue ids) and a git log (git log --numstat --date=iso).

Also, the converter commands are another start of the process, to get the files into a format DV8 can understand. So I think it would be good to focus on them first as well.

converter commands:

core:convert-cluster
- Convert an input file (JSON/XML) to a clustering file (*.dv8-clsx).
core:convert-matrix
- Convert an input file (JSON/XML) to a dependency matrix file (*.dv8-dsm).
export-cluster, core:export-cluster
- Export a clustering as a separate XML or JSON file.
export-matrix, core:export-matrix
- Export a dependency matrix as a separate XML or JSON file.

Any suggestions on these commands and whether all should be implemented? And in regards to creating wrapper functions, would you want separate ones for each of these commands for modularity, or should they be consolidated?

Thank you, Leilani

carlosparadis commented 1 year ago

Hi,

I think answering this question requires @rnkazman to confirm what analysis he wants this effort to enable. Do you know if this has been decided yet?

Also, were you able to activate your DV8 license? Please let me know so I can assign a short task to replace some of this milestone in between if that is taking a while.

leilani-reich commented 1 year ago

Hi, I know from the last meeting Rick wanted us to create wrapper functions related to the architecture flaws and decoupling metric, and potentially economic analysis and manual inspection. But I believe that is the extent of what we discussed.

Yes, I've got the DV8 license activated. I got DV8 standard with a 1-year license. I can access the dv8-console now and see the commands.

Please let me know what I can do.

Thanks, Leilani

carlosparadis commented 1 year ago

Hi,

I reminded Rick to follow up on this. On the meantime, my suggestion is that you check what sequence of commands is required on dv8-console to compute Architecture Flaw metrics, and the DL metric. Here are my best guesses. You may want to confirm with Yuanfang, Rick or Yi since this dates back to my notes from 2018:

Architecture Flaws

Section 2.1 - Cytoscape: cytoscape:convert-matrix (note: convert what matrix? Cytoscape is an XML)  // Parse .Sdsm - On man is just called "dsm"
Section 5.2 - Git: scm:history:gittxt:convert-matrix (note: convert what matrix? the input is gitlog numstat, a .txt) // Parse .Hdsm - On man is just called "dsm"
Section 12.1 - Matrix Commands: core:merge-matrix // merges sdsm + hdsm => "SHdsm" - On Man is just called "dsm" ; other sections call it "merged-dsm". 
Section 7.1 - Analysis: 1. arch-issue:arch-issue (note: the name repeats itself) // man says output is a "folder", but not what is in it or the format.

Metrics

Section 2.1 - Cytoscape: cytoscape:convert-matrix (note: convert what matrix? Cytoscape is an XML)  // Parse .Sdsm - On man is just called "dsm"
Section 5.2 - Git: scm:history:gittxt:convert-matrix (note: convert what matrix? the input is gitlog numstat, a .txt) // Parse .Hdsm - On man is just called "dsm"
Section 12.1 - Matrix Commands: core:merge-matrix // merges sdsm + hdsm => "SHdsm" - On Man is just called "dsm" ; other sections call it "merged-dsm". 
Section 8.1, 8.2, 8.3 - Metrics: metrics:decoupling-level ; metrics:independence-level, metrics:propagation-cost // the naming of the functions is very clear here!

Manual Inspection of DRSpace

Section 2.1 - Cytoscape: cytoscape:convert-matrix (note: convert what matrix? Cytoscape is an XML)  // Parse .Sdsm - On man is just called "dsm"
Section 5.2 - Git: scm:history:gittxt:convert-matrix (note: convert what matrix? the input is gitlog numstat, a .txt) // Parse .Hdsm - On man is just called "dsm"
Section 12.1 - Matrix Commands: core:merge-matrix // merges sdsm + hdsm => "SHdsm" - On Man is just called "dsm" ; other sections call it "merged-dsm". 
(**) Section 13.2 - Clustering Commands: dr-hier:dr-hier (note: the name repeats itself) // SHdsm => .clsx  // Alternatively, 13.1
Section 6.1 - Office Commands: office:export-spreadsheet // .clsx => .xlsx (the equivanent of titan-gui, with the gray colors but in excel)

1) It is essential you understand two data sources are being transformed here. One comes from this Cytoscape (which eventually becomes an SDSM file), and the other comes from the Git Log (which eventually becomes an HDSM file). The git log you are already familiar with. The Cythoscape is a file that is obtained from the source code. Long ago, we used to require a third party tool for that, Scitools Understand. But there is an open source project now that does that called Depends. We want to code the function that use Depends data instead of the proprietary one for this task. I am not certain if cytoscape:convert-matrix is for Depends or Scitools Understand. You will have to check with Rick, Yuanfang or Yi.

2) Once you have the SDSM and HDSM, they are usually merged to compute the metrics. But I believe you could also analyze the files separately without merging them. This, too, has to be checked and documented.

Here's the draft mind map I made just so it is easy for us to find:

I am also posting this screenshot of the DV8 Manual just to facilitate referring, but I can remove it from here if it is an issue:

This nomenclature is very helpful.

3) Item 3 above (Manual Inspection of DRSpace) is also particularly interesting regarding integrating Kaiaulu visualizations. You can see I emphasized one of the commands with (**), for the DRSpace command, which generate .clsx files. I believe these can be parsed and represented using Kaiaulu network visualizations and/or combined with other data Kaiaulu already parses. We would need to look at some example .clsx files.

4) If the .clsx files are binary, one of the other commands may be needed to be coded. For example, on my mind map you see on the top (Section 1 node on the mindmap) I noted commands that convert some of these to Json and XML files. Json could be of potential interest.

5) The architectural flaws command may require a large number of parameters if memory serves me right. They are thresholds that affect the metrics. Or at least used to. You may need to spend a bit of time on this one, and obtain their definitions, because we would want to include them in the project configuration files, therefore extending them.

6) I want you to consider some of these files are helpful in exchanges with other collaborators, who will not be using the dv8-console, but rather the dv8-gui. For these exchanges, it would be helpful if we know which files in this chain would be helpful for Rick to inspect to understand the metrics.

For example, let's say I calculate architectural flaws for the Thirft project using dv8-console. An excel file will be generated using a chain of the commands above, and I will send it to Rick via e-mail. Something on the metric values may look interesting, so Rick asks me for files to then open on dv8-gui for further inspection.

In this case, it would be ideal that I send to Rick the appropriate file that dv8-gui understands so he sees the merged SHDSM, perhaps already with the DRSpace applied to it. This would be far preferred over me having to send the git log, and depends input, and he now has to re-run everything via the GUI (notwithstanding the risk of doing so with different threshold parameters for the flaws). The details of this back and forth you will need to see with him.

7) Finally, in the past, one thing we wish we could do is use Kaiaulu filter functions before DV8 performed architectural flaws. Since DV8 expects the raw git log and the Depends input, that was not viable for interoperability when we used a global command that did all the above (archreport). However, now that you are looking at the individual blocks, this may be possible. For example, we may be able to either generate the SDSM ourselves or filter an SDSM file.

I would focus on the above until we have more details from Rick. The bottom line is, If we can comprehensively compute Architecture Flaws, and the DL metrics, and use and manipulate intermediate files, and know what files can be easily shared to be loaded on dv8-gui it would give a good foundation to extend it in the future and have more flexibility in existing analysis and save a lot of time in discussing data back and forth via e-mail.

I hope this helps, but for this milestone you will need to refine the specific commands with Rick, Yi and Yuanfang since they know the tool much better than I do. Try to run some of these commands above on dv8-console directly on the terminal, and see if you can obtain the flaws, DL, and some of the intermediate files too. This should help you familiarize with the commands, the data they generate, and if I missed any command in between. Try to start documenting that in an R notebook in preparation to the functions that will wrap around them.

rnkazman commented 1 year ago

Please start with integrating the calculation of the Decoupling Level (DL) metric as your first product in this suite.

carlosparadis commented 1 year ago

Hi @leilani-reich ,

Here's a more coherent follow-up based on our call, now in order of priority (and so I don't forget what was agreed). Let's break these into sub-milestones.

3.1 Figure the chain of commands

It seems figuring this out will be part of the milestone itself:

On terminal, using the dv8-console, identify the necessary commands to create the chain for the intended analysis, and generate some example files for each step of the way so you can share with us. Please annotate on this issue the identified chain of commands and associated files. The intended analysis are a) Architectural Flaws, b) Calculating the DL metric, c) Sense Making of Flaws/DL.

These commands seem to be common to both analyses:

~~2.1 Section 2.1 - Cytoscape: cytoscape:convert-matrix~~ (Rick need to confirm the other command for Depends) 
2.2 Section 5.2 - Git: scm:history:gittxt:convert-matrix
2.3 Section 12.1 - Matrix Commands: core:merge-matrix

Then the analysis commands:

Section 7.1 - Analysis: 1. arch-issue:arch-issue (Arch Flaws)
Section 8.1 - Metrics: metrics:decoupling-level (DL)

(Mindmap Section 13): dr-hier:dr-hier (so you can obtain the .clsx file for the command below)
Section ?? - office:export-spreadsheet (Can Export SDSM, or HDSM, SHDSM (the merged fike), and optionally + .clsx i think)

As a deliverable for this sub-milestone, please confirm on this issue that the commands above indeed suffice to perform the 3 analysis, and attach the files that we can read to the issue. I believe you are mostly done with this sub-milestone, pending running the Architectural Flaws.

It seems this is all you need to go from start to end. But if this was the sole goal, we would use archreport command. Since we care about intermediate files for interoperability with Kaiaulu, then comes the next sub milestone.

3.2 Understand the files they generate

As you noticed, when running some of these commands, their intermediate files are binary. So attaching the binary files here will be pointless. Identify what commands are necessary to make sense of the files generated at every step of the chain above and run on the command line so we get some example files:

(Mindmap Section 1): core:export-matrix (to export a .dsm to JSON so we can inspect)
(Mindmap Section 1): core:export-cluster (to export the .clsx to JSON, as this file is used on office:export-spreadsheet)

After this sub-milestone, do the same and confirm the commands suffice to inspect all the intermediate files that were binary of sub-milestone 3.1.

3.3 Wrap them into R functions

Refer to the rstudioapi command to execute Dv8 commands in the R Notebook I sent you. Clear the notebook so it only contains the necessary steps for the # Architectural Flaws session in it + prior steps needed to run it.

Note because DV8 requires a specific file from the gitlog and Depends, which is not what parse_gitlog() and parse_dependecies() provides, you also need to issue from R the commands to generate these files. The notebook I provided should already have the code to do so, since archreports also need them.

As agreed on call, create these on R/dv8.R , and prefix the functions as dv8_. Please feel free to choose the nomenclature for the functions. I'd recommend including at least one word represent the "from format" and one word for the "to format" but we can review these later.

Observe you will also to write three additional functions here:

parse_dv8_architectural_flaws()
parse_dv8_metrics_DL()
parse_dv8_clusters()

These would go on R/parse.R and would ingest the data generated by the DV8 commands:

arch-issue:arch-issue (Arch Flaws) Metrics: metrics:decoupling-level (DL) dr-hier:dr-hier (after core:export-cluster is applied to turn it into JSON).

Since we also want to load them back into R session for exploration. Some of this code is already written on the R Notebook I sent you, as ArchReports already generated these files. Without re-ingestion, a notebook like Social Smells would not work well with DV8, because we would end up with a sea of files in a folder for every 3 months of flaws computed. The current R Notebook re-ingest them, and combine them into a single table, which makes it much more sane to present and inspect.

3.4 Create custom DSM functions for Kaiaulu

As noted on call, Kaiaulu has its own set of filter functions in every R Notebook, and other pre-processing options. However, currently, this can not be provided to DV8, since DV8 Git: scm:history:gittxt:convert-matrix and the Depends load function, expects an output from the raw data instead.

Therefore, if Kaiaulu could generate SDSM and HDSM .json files, then those instead could be provided to DV8 instead of the raw data.

For this to work, you would need to create wrappers for the following function:

(Section 1 Mindmap) core:convert-matrix (sdsm.json,hdsm.json ->sdsm.dsm , hdsm.dsm)

Moreover, you would need to write two functions, which would:

Kaiaulu's parse_gitlog() output -> hdsm.json
Kaiaulu's parse_dependencies() output -> sdsm.json

For this, you need to understand how DV8 format is to represent sdsm and hdsm. You should reach out to Rick on this a bit ahead of time to try and learn the file organization by the time you get to this milestone. On milestone 3.2, we should be able to figure this out when we look at the .json files too.

3.5 Bonus

I am not optimistic we will get this far, but adding this function would be a bonus.

(Section 1 Mindmap) core:convert-cluster (cluster.json-> cluster.clsx)
core:export-matrix

I believe this covers the entirety of Milestone 3 in more detail, and in the order of priority and sub-tasks. Please give some further thought and see if you have any questions.

leilani-reich commented 1 year ago

Hi Carlos,

in regards to sections 3.1 & 3.2, I have checked on the dv8-console all the commands (excluding "cytoscape:convert-matrix" since we are waiting for @rnkazman to confirm the Depends command version of this, and also excluding "core:merge-matrix" since we would need the .Sdsm from Depends). From what I saw, these commands work for the analyses you specified.

Here are the commands and what I ran for each of them:

scm:history:gittxt:convert-matrix:
- Command: dv8-console scm:history:gittxt:convert-matrix github/ics496/helix/helix-log.txt -outputFile Downloads/test-log.dv8-dsm
- Input: github/ics496/helix/helix-log.txt
- Output: Downloads/test-log.dv8-dsm
arch-issue:arch-issue:
- Command: dv8-console arch-issue:arch-issue -outputFolder Downloads/test-arch-output Downloads/test-log.dv8-dsm
- Input: Downloads/test-log.dv8-dsm
- Output: Downloads/test-arch-output
metrics:decoupling-level:
- Command: dv8-console metrics:decoupling-level -outputFile Downloads/decoupling-helix.json Downloads/test-log.dv8-dsm
- Input: Downloads/test-log.dv8-dsm
- Output: Downloads/decoupling-helix.json
dr-hier:dr-hier
- Command: dv8-console dr-hier:dr-hier -outputFile Downloads/test-DRspace.dv8-clsx Downloads/test-log.dv8-dsm
- Input: Downloads/test-log.dv8-dsm
- Output: Downloads/test-DRspace.dv8-clsx
office:export-spreadsheet
- Command: dv8-console office:export-spreadsheet -cluster Downloads/test-DRspace.dv8-clsx -detail -drhier -namespace -outputFile Downloads/test-output.xlsx Downloads/test-log.dv8-dsm
- Input: Downloads/test-log.dv8-dsm
- Cluster file: Downloads/test-DRspace.dv8-clsx
- Output: Downloads/test-output.xlsx
Exporting .dsm matrix:
- Command: dv8-console core:export-matrix -outputFile Downloads/exported-test-matrix.json Downloads/test-log.dv8-dsm
- Input: Downloads/test-log.dv8-dsm
- Output: Downloads/exported-test-matrix.json
Exporting .clsx cluster:
- Command: dv8-console core:export-cluster -outputFile Downloads/exported-test-cluster.json Downloads/test-DRspace.dv8-clsx
- Input: Downloads/test-DRspace.dv8-clsx
- Output: Downloads/exported-test-cluster.json

I am attaching the zip containing the files produced from running the commands: DV8-Test-Output.zip

EDIT - More Commands

To get file from Depends
Command (w/o -m option): java -jar depends.jar java ../../github/ics496/helix depends-0.9.6-output-no-map
Output for above: depends-0.9.6-output-no-map.json AND depends-dv8map.mapping
Command (w/ -m option): java -jar depends.jar java ../../github/ics496/helix depends-0.9.6-output-map -m
Output for above: depends-0.9.6-output-map.json
dv8-console core:convert-matrix
Command 1: dv8-console core:convert-matrix -outputFile depends-0.9.6-output-map-matrix depends-0.9.6-output-map.json
Input 1: depends-0.9.6-output-map.json
Output 1: depends-0.9.6-output-map-matrix
Command 2: dv8-console core:convert-matrix -dependPath depends-dv8map.mapping -outputFile depends-0.9.6-output-map-option-matrix depends-0.9.6-output-map.json
Input 3: depends-0.9.6-output-map.json
Mapping file: depends-dv8map.mapping
Output 3: depends-0.9.6-output-map-option-matrix
NOTE: The difference between command 1 & 2 is that for command 1 I am just using the output json I got from the command from Depends to get the json dependencies (with the mapping option). For command 2, I am using the output json as well as the depends-dv8map.mapping file which I get when I don't add the mapping option to the command to from Depends, which is weird.
Command 3: dv8-console core:convert-matrix -outputFile depends-0.9.6-output-no-map-matrix depends-0.9.6-output-no-map.json
Input 2: depends-0.9.6-output-no-map.json
Output 2: depends-0.9.6-output-no-map-matrix
dv8-console core:merge-matrix
Command: dv8-console core:merge-matrix -outputFile merged-0.9.6-matrix ../DV8-Test-Output/test-log.dv8-dsm depends-0.9.6-output-map-matrix.dv8-dsm

- Input matrix 1: ./DV8-Test-Output/test-log.dv8-dsm (sdsm)
- Input matrix 2: depends-0.9.6-output-map-matrix.dv8-dsm (hdsm)
- Output matrix: merged-0.9.6-matrix.dv8-dsm (shdsm)

dv8-console arch-issue:arch-issue
Command: dv8-console arch-issue:arch-issue -outputFolder arch-issues-0.9.6-output merged-0.9.6-matrix.dv8-dsm

- Input: merged-0.9.6-matrix.dv8-dsm (shdsm)
- Output Folder: arch-issues-0.9.6-output

Updated Command Outputs (using depends package 0.9.6a): Depends-Tests-0.96a.zip

carlosparadis commented 1 year ago

Hi Leilani,

Thanks. Could you format a bit the commands you ran for each of them on your post so it is a bit more clear to me what is input to the command and what is output? Is the first parameter after -outputFile the output file, and the second parameter the input?

As far as the command is concerned for Depends, I heard back from Yi earlier today. This is his response:

As Depends outputs dsm in json format, core:convert-matrix is the one needed.

If you need directions on how to use Depends jar, have a look on it's readme and/or my parse_dependencies() system2 call:

https://github.com/sailuh/kaiaulu/blob/c8c5e02163c9fada80bdb687d54a9f64be5f5bf9/R/parser.R#L550-L558

Note however, I may have not generated a .dsm that is compatible to DV8's core:convert-matrix. I see on the Depends README that it mentions using -m flag for DV8. Could you try running core:convert-matrix on a file generated from Depends both with and without -m to see if both works or if they lead to different files/can be read?

Also, for nomenclature consistency, let's refer to the .dsm generated from the gitlog a project's "hdsm.dsm" (i.e. historical DSM) and the one generated from Depends a sdsm.dsm (I think was structural DSM the name meaning). For the merge files, let's just convention calling them shdsm.dsm.

Thanks!

carlosparadis commented 1 year ago

One more note: It seems that your arch-issue run did not generate the main file we wanted. Going by what I wrote in Kaiaulu with archreports, it would have been a file likely at this path:

DV8-Test-Output/test-arch-output/modularity-violation/file-measure-report.csv

I suspect this may be due to you not having provided the hsdsm.dsm due to us not figuring out the Depends command at that point. So, try to give another go on arch-issue:arch-issue with a hsdsm.dsm this time (which results from the merge command of sdsm and hdsm), and see if this file is generated this time around.

leilani-reich commented 1 year ago

Hi Carlos, I've updated my previous comment with more commands (& an new zip for outputs) I tried given we know the Depends command now. Also, do you want us to use a specific version of Depends? I noticed on the Kaiaulu README it says "Depends was last tested on 0.96a", so I used that version for my latest command outputs.

In addition, I wanted to note something interesting I found. In regards to the Depends command (in the README https://github.com/multilang-depends/depends), I found that if I exclude the -m option, then it gives me the file "depends-dv8map.mapping" and the output json for dependencies. However, if I include the -m option, it only gives me the output json file. This is weird; you would expect the command to work the other way around, right? Also, I have included my output for this in my newest zip linked above.

Finally, I did not find any file-measure-report.csv, even after using the shdsm.dsm as input to the arch:issues command. But there were more folders produced in the arch-output, including clique, package-cycle, and unhealthy-inheritance. The first arch-issues output I provided only had modularity-violation.

carlosparadis commented 1 year ago

Hi,

Thank you for doing this exhaustive checking on the commands and possible inputs. I did a minor re-formatting of your message just to help me go back and forth a bit.

I agree the -m flag is weird. Would you mind checking if the current version of Depends also suffer from this issue? If it does, I can check if the more recent version of Depends breaks anything in Kaiaulu (e.g. due to new format) so we use a more current version instead. Otherwise, if the problem persists on the more recent Depends version, you can open an issue there and link here as we did with Perceval. Note: I am only asking for you to test the -m flag has the reverse behavior on the newer version, not all you already checked.

I suspect the file we are after in Architecture Flaws in your folder is the anti-pattern-summary.csv. However, we may be missing a flag to this command so it calculates this summary at file granularity so every row is a file. I think the default output of the command is the coarse aggregate we are looking at.

Have you noticed anything suggestive of that? I faintly remember running through this issue when calculating arch reports. I am waiting on Yi to get back to me on two other questions, so I will try looking on my e-mails to see what I can find on the meantime.

I think all the commands you listed and tried so far will be necessary. Using that and the Milestone plan I laid out for you, could you give it a shot and do the task assignment for everyone in the group? You can update this issue first message with the checkboxes, much as we did with milestone 1 and 2. Try to make it so you don't end up blocked on each other tasks this time. I will review once you post to double check that.

Thanks!

leilani-reich commented 1 year ago

Hi Carlos, I tried the Depends command on both depends-0.9.7-package-20221104.zip and depends-0.9.7-package-20221104a.zip from the newest release, and I am still getting the reverse behavior with the -m flag. I will make an issue on Depends for that. Link to issue: https://github.com/multilang-depends/depends/issues/46

In regards to the file-measure-report.csv file, I discovered that it will be produced in the main folder output for arch-issues if we add the -fileStat flag to the command. However, I did not see a file-measure-report.csv within the modularity-violation folder itself. Maybe I'm missing another flag.

Here's what DV8 says about the -fileStat flag:

 -fileStat    Output file statistics as per arch issue analysis result.

Here's the command I ran:

dv8-console arch-issue:arch-issue -fileStat -outputFolder arch-issues-0.9.6-output-again Depends-Tests-0.96a/merged-0.9.6-matrix.dv8-dsm

And the output produced: arch-issues-0.9.6-output-again.zip

We will work on assigning tasks for milestone 3 with the checkboxes, thank you.

carlosparadis commented 1 year ago

I see... was -fileStat present on your copy of DV8 Manual? I see it showing now on dv8-console man now I have the license, but my copy of the manual does not appear to include the -fileStat flag.

Thank you for zipping the files again. file-measure-report.csv is what we want, and what I have ArchReports generating in the past. I think there is an inconsistency between the column names from archreports and this file, but we can just adopt the ones on the file you generated, which I am attaching here separately for future reference.

file-measure-report.csv

Don't worry about the files generated inside the folders as resulting from arch-issues:arch-issues. I believe they can all be derived from summary statistics of the file-measure-report.csv, but I will double check with Rick.

As you noted, please go ahead when you have a chance and post the issue on Depends. I guess for now lets stick to the version Kaiaulu has on the README then. We have enough to keep tabs on as is :')

Okay, I think we are finally nearing the end with Milestone 3.1 and Milestone 3.2 (mentioned on https://github.com/sailuh/kaiaulu/issues/165#issuecomment-1469074768), phew! Here's my suggestion on how to condense all you have learned in something more digestible, so it can be referred on the subsequent milestones, and back and forth between us.

You can use https://draw.io/ to depict the chain of commands we will convert into functions, and the files they output and input. It doesn't need to be fancy. Just a few boxes with the command names (no parameters to avoid clutter), and the file names should suffice / other elements involved. It should also only include the files we care about. So let's consider the one we just went through for arch-issues. I also took the opportunity to include here Milestone 3.4 so you see how it modifies the pipeline, and some of the final end functions of Milestone 3.3.

dv8_kaiaulu_api_v1 (1)

I will send you the editable file above via e-mail since GitHub will not let me attach them here. Please feel free to correct any additional input I may have missed. Note I have also not included the dl-metric pipeline to this. Please extend it to account for that.

Start reading the image from "project.git" following the right arrow (instead of the one above), and likewise, the "project src" following the right arrow (instead of the one below). These two files follow the pipeline we have been discussing insofar to generate the architectural flaws (arch-issues output).

I have colored in red circles files that do not have use for us in Kaiaulu, but that are extremely useful to collaborators, as they can load it in dv8-explorer to inspect. They are also intermediate step files for us.

Green circles are files that are useful to manually inspect. Yellow circles which are useful for interoperability to Kaiaulu.

Note there is a purple circle towards the very end. There is some information in that file that we should ingest back in Kaiaulu, however some information would be hard to read (e.g. the clusters are defined on the excel file as rectangles). So, to get that information, as far as I current understand (pending Yi follow-up), we need to obtain the rectangles instead from project-drspace.dv8-clsx (which is an input to office:export-spreadsheet). This requires your group to now have to make a wrapper to the associated function. To obtain the cell values of the purple circle, I believe the file-measure-report.csv has that information. Hence, we completely avoid needing a parser function in R for the excel file.

Last but not least, if you look at project.git and project src again at the very start, but instead follow the arrows above and below, you will see Milestone 3.4. The green R functions are implemented in Kaiaulu. The white ones would need to be implemented by you (dv8_gitlog_to_hdsm and dv8_gitlog_to_sdsm). You can see if we have that, we can incorporate Kaiaulu filters (among other things not included in this diagram for simplicity). This would be very helpful to ensure consistency to the input to Social Metrics Notebook. Failing that, Dv8 will use one set of files, and Social Smells another.

I took care to define a nomenclature for the files, functions and commands in this diagram. So let's use them for the milestones too and in our exchanges.

I hope this makes sense. Again, please extend this diagram with the DL Metric pipeline and/or any other you feel I missed from our prior discussion. I hope this diagram proves helpful for upcoming meetings and discussion.

leilani-reich commented 1 year ago

Hi Carlos,

I did not see the -filestat option in the arch-issue:arch-issue command section of the dv8 user guide. It is only in the dv8-console. I am using DV8 2.0 Build User Guide (linked in the DV8 issue comments near the beginning).

In regards to the diagram, I added a small section for the decoupling level metric. Also, I added the mapping file (automatically called depends-dv8map.mapping) that we get from Depends if we don't use the "-m" option for the Depends command. From what I tried, it is an optional file that can be passed into the core:convert-matrix command as the "-dependPath." If I don't include the mapping file, I still get a dv8-sdsm.

Here is the core:convert-matrix command in detail to show what I mean:

usage: dv8-console core:convert-matrix [-dependPath <MAPPING_FILE>] [-h] [-outputFile <OUTPUTFILE>]
       [-xml] INPUT_FILE
Convert an input file (JSON/XML) to a dependency matrix file
 -dependPath <MAPPING_FILE>   Dependency type mapping file location for user customized dependency
                              type
 -h,--help                    Print usage information for a command
 -outputFile <OUTPUTFILE>     The output file to create (*.dv8-dsm)
 -xml                         Convert to XML (instead of JSON) format

Here is the updated diagram. I will continue to update if something else comes up.

dv8_kaiaulu_api_v1 drawio

carlosparadis commented 1 year ago

Sounds good! In regards to:

it is an optional file that can be passed into the core:convert-matrix command as the "-dependPath." If I don't include the mapping file, I still get a dv8-sdsm.

Does that mean the mapping file is redundant? Do both JSON look the same / have the same size?

leilani-reich commented 1 year ago

The Depends json I get w/o using the -m option is 1,776,469 bytes while the one I get using the -m option is 1,776,278 bytes, although the difference may just be because I named the files differently ("depends-0.9.6-output-no-map.json" and "depends-0.9.6-output-map.json"). But the cell sizes in the json are also slightly different. 12,852 cells in "depends-0.9.6-output-map.json" versus 12,854 cells in "depends-0.9.6-output-no-map.json."

Also, when I look at the dv8-sdsm files generated by each different json when running core:convert-matrix, they are slightly different (103KB vs 106KB), but I still think it's just the file names. It's hard to tell.

carlosparadis commented 1 year ago

So basically, we have no idea what is going on then with these -m flags I guess.

Could you compile a list of questions so we check with DV8 group? I will handle the back and forth, while you focus on the milestones.

From memory what I recall is:

fileStat being missing on their docs
the strange -m flag on Depends (which you also opened an issue)

The pipeline I think we have it cleared out now, and I saw you extended the diagram with dl-metrics. Was there anything else that needs checking with them? The purple circle of the diagram for I already e-mailed too.

p.s.: Please send me your editable via e-mail of the final diagram. I don't think I received it, or if I did I missed it!

leilani-reich commented 1 year ago

Yes, I can make a list of questions. I think the ones you mentioned are all the issues we saw so far.

Also, I did notice that the core:export-matrix wasn't in the diagram. Do we need another wrapper for this command? Or is that basically what the dv8_gitlog_to_hdsm() and dv8_gitlog_to_sdsm() are doing?

I will make sure to email you the final diagram once I get confirmation on the core:export-matrix. Thanks!

carlosparadis commented 1 year ago

Also, I did notice that the core:export-matrix wasn't in the diagram. Do we need another wrapper for this command? Or is that basically what the dv8_gitlog_to_hdsm() and dv8_gitlog_to_sdsm() are doing?

No, they have different intents.

core:export-matrix | dsm -> json dv8_gitlog_to_sdsm() and dv8_gitlog_to_hdsm() are in essence one function, save for different column names. But here, you are converting an R table, and converting to json. This function requires you to understand the format dv8.json uses, it is not a simple wrapper.

Since we already parse Depends input in parse_dependencies() and gitlog via Perceval, I am considering core:export-matrix lower priority. I will edit it to be part of Milestone 3.5 as a bonus.

https://github.com/sailuh/kaiaulu/issues/165#issuecomment-1469074768

I also want you to modify one thing on your diagram. parse_dv8_metrics_DL() should not be linked to the cloud icon. That cloud is trying just to represent the equivalent of the purple circle icon, which is the .xlsx. The DL metric is not included in the purple circle.

Lastly, there will be no time for unit tests DV8 wrappers (nor I think they are applicable since they are primarily wrappers). But I feel a cheat sheets for the DV8 side, based of all we discussed on this issue, would be extremely helpful. Do you think you could fit that in? It would also probably condense much information for your final presentation, which would otherwise be lost.

Please let me know. I plan to talk with Rick in a few minutes, so I will go over the task division with him and let you know.

Edit: For clarity, the Cheat Sheets in order of priority would replace Milestone 3.5, and 3.5 would be pushed to 3.6 as bonus bonus.

leilani-reich commented 1 year ago

Hi,

Adding the DV8 cheatsheet sounds good. I will add that to the task list. Should there be multiple? Like one for architecture flaws and one for the dl metric, since these are the main notebooks we will create?

I've updated the dl metric so that the parser function outputs a table in R memory. Also, should we exclude the core:export-matrix and core:convert-cluster from Milestone 3.5 from the diagram for now? If not, I was thinking the core:export-matrix would be linked to each of the red circles that are dsms and would create an output json file. Also, I was thinking to link the core:convert-matrix to the project-drspace.json. But I'm not sure that is a good idea.

Here is the updated diagram so far: dv8_kaiaulu_api_v1 drawio (1)

carlosparadis commented 1 year ago

Hi,

I edited your original post to a more recent version of the diagram, which includes your suggested task assignments by color. I am also pasting here again just to minimize scrolling back and forth, and I am sending the editable file soon after by e-mail:

dv8_kaiaulu_api_v3_task_assignments

The function dv8_gitlog_to_hdsm() and dv8_gitlog_to_sdsm() are fundamentally the same. I would assign it to just one person. This is because, in memory, the table returned from parse_dependencies() and parse_gitlog() is basically the same structure. The challenge of Milestone 3.4 is figuring out the template of the .json that core:convert-matrix will take, which you see from the diagram is the same for both sdsm and hdsm json files. If that is not too clear, we can iterate.

Also, I realize I typoed the function name. dv8_gitlog_to_sdsm() -> dv8_depends_to_sdsm() is the more correct name. There is no gitlog involved on sdsm, just the hdsm.

I have also added function name suggestions to all commands unless I forgot something, and emphasized the functions you are coding as intermediates for dv8 (e.g. where you typed "Depends" or "git --numstart --date=iso"). They mostly follow this convention:

somedata_to_newdata
the function name includes a suffix "j" if the output is json, and "b" if it is a binary.
the function prefix indicates the variation (e.g. sdsm, or hdsm).
some functions do not have a prefix, e.g. dsmj_to_dsmb. This is to emphasize the command can covert either sdsm or hdsm.
functions that are wrapping around dv8 or exist to serve dv8 pipeline are prefixed as dv8. Functions that extend Kaiaulu interoperability, however, do not have the dv8 prefix.
Some file names were slightly modified (e.g. the .clsx is no longer called project-drspace).

Please take a look and see if the function names are consistent to what is described above and if it makes sense to your group. The intent is to look at the function name and being capable to identify the input, output and if json or binary, and if dv8 pipeline or not. If you agree, please rename the first message commands to their new associated function names.

As far as task assignment is concerned for the other functions, this looks fine, but the call was too short to assess with Rick. I suggest you go ahead since they are mostly system calls. The harder functions that you may want to make sure the group understands are harder I expect to be:

Leilani: gitlog_to_hdsmj/dependencies_to_sdsm() --- again coding one means the other is basically done, but this is likely the hardest of them all.
Leilani: parse_dv8_clusters() -- because requires understanding the format of the clsx.json to parse it.
Malia: parse_dv8_architectural_flaws() -- For requiring understanding what files are generated so they are read back into memory. The file itself is easy, since a .csv.
Nico: parse_dv8_metrics_decoupling_level() is simple since it is a 1 line json with a few values.

The reminder is basically system2 calls around the command and should be identical to one another. Remember to rely on tools.yml.

Lastly, if you agree with all the above, this time around let's create one issue for each of your tasks so I can iterate separately with you 3. I expect there may be more questions, and this issue already runs very long and is best left for the Cheat Sheet.

leilani-reich commented 1 year ago

Hi Carlos, that sounds good. Also, do you mean we should each have separate issues for each of our tasks (an issue for all my tasks, an issue for all Malia's tasks, an issue for all Nico's tasks), or did you mean a separate issue for each function we will implement? Thanks.

carlosparadis commented 1 year ago

Hi Carlos, that sounds good. Also, do you mean we should each have separate issues for each of our tasks (an issue for all my tasks, an issue for all Malia's tasks, an issue for all Nico's tasks)

This. So you each would create one issue, for a total of 3 more issues, and refer in your comments it is a follow-up to this issue #165 so GitHub links them.

carlosparadis commented 1 year ago

@malialiu @nicoelee123 @leilani-reich

Overview

Hi, I got more information from Yi today. This concerns @malialiu's parse_dv8_architectural_flaws(), but it also has a relation to @leilani-reich' dv8_clsxb_to_cslxj. This also basically makes this function effort to raise more so than all the others, so it may be something your group wants to discuss on effort re-balancing.

This also promotes the function dv8_dsmb_to_dsmj(), which wraps around the command core:export-matrix, to go from 3.5 Bonus milestone, to be part of 3.3 Milestone, as it is now needed to fully realize parse_dv8_architectural_flaws(). There is a good chance parse_dependencies() may work on the output of the command. So, I suggest to manually generate a file first out of the the core:export-matrix and then test if parse_dependencies() can parse it. This will save you a lot of time.

The Issue

Here's why and how the new implementation has to be. Originally, the hopes were that by using arch-issue:arch-issue with the -fileStat we could compute the metrics as an aggregate.

file-measure-report.csv

If that was the case, then parse_dv8_architectural_flaws() would be a very simple function: Execute the arch-issue:arch-issue` command and then proceed to read the table into R. However, this will not work.

The reason for that, is that most architectural issues (aka flaws/anti-patterns) are not a file metric by definition but a metric that involve multiple files. Quoting Yi (data statistics here refers to the file-measure-report.csv):

Specifically, the data statistics is counted for each file, and in most cases an anti-pattern could happen to multiple files, so they all get +1 for the same issue.

And therefore, doing a group by over this table, will lead to counting the architectural flaws incorrectly several times for where we intend to use them... which is alongside social smells.

If you recollect the social-smells notebook you executed, the final table is an aggregate:

http://itm0.shidler.hawaii.edu/kaiaulu/articles/social_smell_showcase.html#merge-churn-smells-and-line-metrics-1

As you can see from the first column, we calculate metrics over a period of time for every row. This, therefore, requires any metric to be grouped by across the files over that period. From what Yi says above, we can't do so using file-measure-report.csv.

How to

So how do we obtain a file we can aggregate? This is how, and where the effort of writing this function spikes:

arch-issue:arch-issue generates a number of folders:

You will notice most of these folder names match the column names from file-measure-report.csv

So for example, in this project below, DV8 identified 19 Cliques (which is one type of architectural issue/flaws/anti-pattern):

Inside each of the 19 clique folders, you will find the DSM files. The one we care about here is the 19-merge.dv8-dsm. This file is a binary, and needs to be converted into a json so we can parse it (this is where the milestone 3.5 function now becomes milestone 3.3 priorty-wise). By parsing 19-merge.dv8-dsm, we now know the files which are associated to the architectural issue 19. Assume for the sake of example we find "file 1" and "file 3" (which are represented as filepaths) is contained in this .json file converted from 19-merge.dv8-dsm. This information could be represented as follows in a .csv:

file1, clique19
file3, clique19

We can now generalize the process above, to obtain all ?-merge.dv8-dsm from the Clique folder, and annotate them in a table like so:

file1, clique1 
file3, clique1
.
.
.
file1, clique19
file2,clique19

And yes, this would require this function to convert on the fly every single ?merge-dv8.dsm to a json, to subsequently be parsed to construct this table.

Finally, since all folders output by arch-issue:arch-issue follow this standard:

The file could be generalized to something like this:

file1, modularity_violation1
file2, modularity_violation1
file1, modularity_violation2 
file3, modularity_violation2
file1, clique1 
file3, clique1
.
.
.
file1, clique19
file2,clique19

This completes the new function implementation.

In a file like this, the parsed file by parse_dv8_clusters() can be inner joined via filepath to it (this should not be part of this function responsibility, I am including here to finish the clarification in the example and it would be done in the DV8 Notebook instead as part of the showcase). It would now look like something like this:

file1, modularity_violation1,M0,L0
file2, modularity_violation1.M0,L0
file1, modularity_violation2,M1,L1
file3, modularity_violation2,M1,L1
file1, clique1, M2,L2
file3, clique1,M2,L1
.
.
.
file1, clique19, M1, L1
file2,clique19, M3,L0

This file is what we really need, and unfortunately can not be obtained directly from DV8. In a file like this, we can now aggregate for Social Smells time window. The catch is that we should be doing this group by the second column, in a given period of time. E.g. we would count how many cliques were identified (in this case 19), not the count of files that participate in a clique (in this case we are counting the same clique many times, as many times as there are files involved in the 19 cliques), which incorrectly inflates the metric (which is what using file-measure-report.csv would accomplish).

If the group by is now per module (M0,M1) etc, we again need to do so via this column, instead of the values in file-measure-report.csv.

I hope this makes sense. Please discuss this with Rick at the meeting today if you are able to catch up on this in time. I know it is fairly lengthy, but it would probably be too much to explain a call just by voice. Please let me know if you have questions. I believe Rick is also due a meeting with Yi, so hopefully, this can be reiterated before you proceed so we are certain this is the only path forward or really needed.

carlosparadis commented 1 year ago

Minor edit:

In a file like this, the parsed file by parse_dv8_clusters() can be inner joined via filepath to it (this should be part of this function responsibility, I am including here to finish the clarification in the example and it would be done in the DV8 Notebook instead as part of the showcase). It would now look like something like this:

This should not be part of this function responsibility, I include here to finish the clarification in the example, and it would be done in the DV8 Notebook instead, as part of the showcase.

Again, inner joining clustering is not part of the parse_dv8_architectural_flaws() responsibility. It would be done only in the R Notebook.

carlosparadis commented 1 year ago

@leilani-reich @nicoelee123 @malialiu

Hi all,

It seems at least one of the functions was never assigned to anyone, despite being assigned on the diagram.

dv8_kaiaulu_api_v3_task_assignments

The dv8_depends_to_sdsmj() is missing in the code base. @malialiu this is also missing on your issue checklist.

Please double-check the diagram and ensure all the functions in it are accounted for. I need this function ASAP to finish testing the pipeline on #182.

leilani-reich commented 1 year ago

Hi Carlos, the dv8_depends_to_sdsmj() was my function before, and you said here the parse_dependencies() function in Kaiaulu already implemented the functionality we wanted so we did not need this function, if I'm not mistaken.

carlosparadis commented 1 year ago

No, you are correct in your interpretation, I forgot the modification to /tmp/. However, I guess if I am getting confused after one month later, the function will likely be misleading to others. I already reused part of said function to code one in the interest of finishing the review of DV8 today, so I will just paste my function to Malia's PR once I am done.

Please carry on with the suggested task list sent by e-mail.

Thanks!

leilani-reich commented 1 year ago

Hi Carlos, for the DV8 cheatsheet, do you want it to be formatted similarly to the one for Kaiaulu in https://github.com/sailuh/kaiaulu/issues/159, with a section for introducing the config file and the dv8 functions?

Or should the focus be the dv8 diagram illustrating how all the functions connect?

carlosparadis commented 1 year ago

Similar to Social Smells, so project configuration file should be included. However, you should still mention the major functions in the order one would need to execute them.

I should note I ended up rewinding the "hsdsm" into a "mdsm" nomenclature wise (it will be pushed to Kaiaulu repo hopefully tonight if I can finish in time (the docs in itm0 is not using that nomenclature yet). So for the combined h+s dsm, just refer to them on your cheat sheet as mdsm. If this is not clear, ignore this paragraph and proceed as the corrections would be very quick.

But onto your question: To give you some reference point, the end of your cheat sheet would be the dv8_mdsmb_to_flaws and dv8_mdsmb_to_decoupling_level, instead of smell_organizational_silo, etc.

Since both functions require mdsmb, we have then to explain in the cheat sheet what this means. Hence, dv8_hdsmb_sdsmb_to_mdsmb has to be included.

In turn, note you need the hdsmb and sdsmb functions, which file ties to the git log and Depends. For simplicity sake, use your functions here from milestone 3.4, combined to parse_gitlog() and parse_dependencies() and if possible include the filter functions too used in Social Smells since they should be used in most of the analysis.

Minimally, this should be included alongside the project configuration file. If there is space, we can mention the user can export sdsm, hdsm and mdsm to excel for inspection too using dv8_mdsm_drhier_to_excel, or that the files can be loaded in DV8-GUI (a separate program DV8 also offers).

As for recommended notebooks, suggest the Gitlog and Depends ones:

The project configuration file I suggested somewhere in this sea of comments...I will try to find it.

Let me know if the above is not clear. Also, please make sure this cheat sheet has a disclaimer to ArchDia website, and that executing this functionality requires a paid license from DV8. :) Please discuss with Rick how he prefers this wording to happen. It has to be very clear Kaiaulu is a facilitator of the functionality as a DV8 API Wrapper, rather than offering DV8 functionality (which differs from the Social Smells Cheat Sheet).

Thanks!

leilani-reich commented 1 year ago

Sounds good. Thank you!

carlosparadis commented 1 year ago

Here you go: https://github.com/sailuh/kaiaulu/pull/171#issuecomment-1492953371

More specifically:

tool:
  # Depends allow to parse file-file static dependencies.
   dv8:
      # the project folder path files in function calls will be used. 
      # The folder name will be used in the file names.
      folder_path: ../../rawdata/dv8_thrift
      # the architectural flaws thresholds that should be used
      architectural_flaws:
         clique_depends: call,use
         crossing_cochange: 2
         .
         .
         .
         change_cover: ?

It should look like this. The R notebook then uses folder_path and create the files following the -*hdsm.dv8.dsm etc conventions accordingly. Meanwhile, the functions parameters do not assume anything, and gives full freedom to the user to specify the file path, name and extension they wish. I noticed some functions had a project_name parameter, but I am fixing them right now too.

For the list of parameters in the example above, refer to:

https://github.com/sailuh/kaiaulu/blob/ba4da7cac66b3b767f358f0be37e63fbefdb7c66/R/dv8.R#L301-L313

For the optional parameters, use the ones from Depends:

https://github.com/sailuh/kaiaulu/blob/ba4da7cac66b3b767f358f0be37e63fbefdb7c66/conf/apr.yml#L97-L117

I believe these are the major parameters someone doing data analysis may want to "play with" when investigating architectural issues. And thus, they may have multiple "apr.yml" conf files testing different parameters to run against DV8 Notebook. These are usually the ones we want to surface to the config file. Depends affect what function dependencies DV8 sees, and in turn, DV8 thresholds affect the sensitivity of what it will consider specific patterns of them to be a problem. So they go hand in hand.

One last thing: Include the social smells in the "See other Notebooks" section. Since ultimately, many of the future analyses will combine both :)

Thanks!

leilani-reich commented 1 year ago

Ok, thank you.

leilani-reich commented 1 year ago

Hi Carlos, here's what I have for the DV8 cheatsheet.

One thing I noticed in the config file was it seems like for the dv8 architectural_flaws, the parameter "historyImpact" should actually be called "uiHistoryImpact", checking the dv8-console arch-issue:arch-issue command. So I used "uiHistoryImpact" as my parameter in the config file in the cheatsheet.

Also, is dv8_dsmj_to_dsmb() (which is core:convert-matrix) technically only ever used for the sdsm files? Since we use dv8_gitnumstat_to_hdsmb() (which is scm:history:gittxt:convert-matrix) for the hdsm files.

dv8-cheatsheet.pptx

carlosparadis commented 1 year ago

One thing I noticed in the config file was it seems like for the dv8 architectural_flaws, the parameter "historyImpact" should actually be called "uiHistoryImpact", checking the dv8-console arch-issue:arch-issue command. So I used "uiHistoryImpact" as my parameter in the config file in the cheatsheet.

Thank you for catching the parameter. It seems the incorrect parameter was specified in the original PR:

https://github.com/sailuh/kaiaulu/commit/302e26cdbb9ea6ec84afdd16d38697a2d0f6ce63#diff-f7557267e8cb3399084da6146186558b93329baa8fa5a87496e78935ac5e51eaR94

So I guess DV8 ignores the parameter. Strangely enough, I remember encountering issues on another function parameter that was incorrect: uiHistoryImpact was sent on the PR as uihHistoryImpact (which is very easy to miss since all parameters are "uih", the H overlap caused the slip as it is inconsistent).

Also, is dv8_dsmj_to_dsmb() (which is core:convert-matrix) technically only ever used for the sdsm files? Since we use dv8_gitnumstat_to_hdsmb() (which is scm:history:gittxt:convert-matrix) for the hdsm files.

In the case of the main pipeline for dv8 yes, but in the case of your milestone 3.4 functions, we need for both sdsm and hdsm. One thing we never tried was to export a merged dsm into json, which I think is possible, to see what the format used was. With your function being generalized to R/graph.R and R/transform.R, I highly believe we could do the "merge.dsm" ourselves in R memory, and then use your function to export a mdsm.json instead! Once more, we would be relying on dv8_dsmj_to_dsmb(). To be clear, I am not suggesting you do this, just that the function has a lot of untapped potential thanks to your Milestone 3.4 and the final revision to refactor it. Going the mdsm.json route would replace two command calls to dv8_dsmj_to_dsmb() (sdsmj and hdsmj) to one (mdsmj) :)

Sending a few comments on the file in a subsequent comment in a moment. And I will reply to your other comment for Maliia's 3 Ask in a moment too.

carlosparadis commented 1 year ago

DV8 Disclaimer

For DV8 downloads and license information see ArchDia.com.

This is fine, and I saw you got it from Rick (thanks). But this is too intrusive. Could you move this below the following statement:

CC BY SA Carlos Paradis • Learn more with the dv8_showcase.Rmd vignette • Kaiaulu package version 0.0.0.9600 (in development) • Updated: 2023-04

With same font, size, color, etc?

About

The dv8_showcase.Rmd vignette introduces DV8 integrated functions inside Kaiaulu. Some are wrapper functions while others allow interoperability between Kaiaulu and DV8.

The dv8_showcase.Rmd vignette introduces ArchDia's DV8 design structural matrices, architectural flaws, and decoupling level in Kaiaulu.

DV8 : A third-party tool suite specializing in analyzing code projects and returning software metrics, component visualization, and more. Check out the user guide here.

Let's omit DV8 definition (which is too much to cover in a sentence). Instead, let's focus on the analysis definition / features we provide via DV8 instead:

Decoupling Level : Measures how well a design is separated into modules based on the DRH clustering.

Yes, this is good.

A definition of Architectural Flaws is needed here. Use this and include the hyperlink as you did above: DV8 can detect 6 types of architectural flaws. Clique, Package Cycle Improper Inheritance, Unstable Interface, Crossing and Modularity Violation.

We also need a definition for the DSM. Use this and hyperlink: Design Structure Matrix (DSM) is a visual model for expressing dependencies.

With the above you have the 3 main definitions for your last 3 functions.

Project Configuration File

For the project configuration file, I would rename from "Optional Fields" on Depends to "Indirect Fields". The truth is that Depends is also a required field, because you run Depends to get a sdsm for DV8. But they are not directly related as they are not part of DV8 parameters.

Required Fields

Required Fields: Mention DV8 as DV8 4.0-20210630.025325+

Functions (Middle Column of the .pptx)

Maybe there was some miscommunication between us somewhere, but it suffice to include only your 3.4 milestone functions. I.e. do not include dv8_gitnumstat_to_hdsmb() to save space.

filter_by_file_extension() & filter_by_filepath_substring()

Remove this and diagram but:

parse_gitlog() & gitlog_to_hdsmj()

Replace to:

parse_gitlog(), filterby*(), gitlog_to_hdsmj()

And also

parse_ dependencies() and depends_to_sdsmj()

Replace to:

parse_dependencies(), filterby*(), gitlog_to_sdsmj()

I.e.: I want you to remove the filter function definition to have some more space. Mentioning as above should suffice, especially since their diagram suggests a filtering step.

Additionally, note there is something wrong with the icon .depends. This should be a folder icon with the name src. Depends takes as input the folder with source code of the project if interest.

dv8_gitnumstat_to_hdsmb()

You can remove this. One method is enough.

core:convert-matrix()

I am not sure why you used the DV8 command nomenclature here or its definition...remove this too...

Related Vignettes

Yes, this is good.

Functions (Right Column of the .pptx)

dv8_hdsmb_sdsmb_to_mdsmb()

You should have enough space to move this to the middle column now.

Merges matrices in .dv8-dsm files (historical DSM and structural DSM) to one new matrix in a *.dv8-mdsm file (merged DSM file).

Replace this with:

Merges matrices in .dv8-dsm files (historical DSM and structural DSM) to one new matrix in a *-merge.dv8-dsm file (merged DSM file).

Observe there is no such a thing as dv8-mdsm. The suffix is always .dv8-dsm. What changes is the -hdsm/-sdsm/-merge.dv8-dsm. Please also correct the icons, as they have the wrong extension. In fact, now that I noticed, all icons are wrong. Make sure you abide by the terminology we discussed previously and fix all of the icons. Here's the reference once more:

The icons for this function are also a bit confusing. The symbol of equivalence can be misleading. Instead put, the -sdsm.dv8-dsm icon below the -hdsm.dv8-dsm icon, then have them both have arrows pointing to *-merge.dv8.dsm.

dv8_mdsmb_to_flaws()

Ths is fine, save for the icon name for mdsm being wrong as mentioned above.

dv8_mdsmb_to_decoupling_level()

Fine, again, save for the extension issue.

dv8_mdsm_drhier_to_excel()

Fine, again, save for the extension issue.

Final remarks

You should maybe have enough space to add one more function. Try to include the dv8_mdsmb_to_hierclsxb in the ppt. Then, you can also include one icon to dv8_mdsm_drhier_to_excel(). More specifically, above the -merge.dv8-dsm you would include a -clsx.dv8-clsx icon. It's arrow would point to mdsm.json, and it would be dashed. (I..e. this is actually a file you can optionally add to this function as parameter, so the excel also shows in black rectangles the clusters).

leilani-reich commented 1 year ago

Hi Carlos, here's an updated cheatsheet based on your feedback.

dv8-cheatsheet.pptx

carlosparadis commented 1 year ago

Great!

I think we have one last pass here. Sorry I couldn't see all in the first revision, but they are way minor:

parse_gitlog(), filterby*(), gitlog_to_hdsmj()

hdsm.json replace to *-hdsm.json (i.e. just add the *- as prefix to the file name).

parse_ dependencies(), filterby*(), dependencies_to_sdsmj()

Same as above.

dv8_hdsmb_sdsmb_to_mdsmb()

We want to add one more function here side by side. I.e. Replace dv8_hdsmb_sdsmb_to_mdsmb() to dv8_dsmj_to_dsmb(), dv8_hdsmb_sdsmb_to_mdsmb()

As far as the figure goes, just add two extras files: -*sdsm.json to the left of *-sdsm.dv8-dsm and -*hdsm.json to the left of *-hdsm.dv8-dsm. Then include an outgoing arrow from -*sdsm.json to *-sdsm.dv8-dsm, and an outgoing arrow from -*hdsm.json to *-hdsm.dv8-dsm.

Related Vignettes

Add a new item 1. dv8_showcase.Rmd Then the existing 1 becomes 2, the 2 becomes 3., and 3 becomes 4.

dv8_mdsmb_to_decoupling_level()

The output icon here says just .json. Let's call it *-dl.json.

dv8_mdsmb_to_hierclsxb()

The output icon says -*merge.dv8-clsx. It should be instead *-clsx.dv8-clsx according to my last image above.

dv8_mdsmb_drhier_to_excel()

The rightmost "-merge.dv8-xlsx" should be named as `-merge.xlsx` instead. Why: The ".dv8-dsm" or ".dv8-clsx" is only used in DV8 for binaries, so we want to be consistent to avoid user confusion. It doesn't actually define the remaining notation for json and xlsx, but I think this is the sanest thing we can use for them.

One last change: The cluster file name here is correct (*-clsx.dv8-clsx). However, its dashed arrow points the wrong way. It should point to *-merge.xlsx instead.

Lastly, include a small text towards the empty space below and to the right of dv8_mdsmb_drhier_to_excel() stating: parser functions are also available on Kaiaulu API for various DV8 json files

This should fully cover the pipeline, include the DV8 Notebook for details, and supplemental notebooks for further reading, without any gap in between the functions, while reminding the user they don't need to figure out the json readers.

This should be it.

Thanks!

carlosparadis commented 1 year ago

Oops,

One last thing: I am not sure if it is ok to use ArchDia logo besides Kaiaulu on the top right. You may want to just keep the Kaiaulu logo alone instead, unless Rick explicitly asked you to include it.

Edit: (And I just realized: You typed "Architectural flaws". Capitalize flaws -> "Architectural Flaws").

leilani-reich commented 1 year ago

Hi Carlos, made the revisions. dv8-cheatsheet.pptx

carlosparadis commented 1 year ago

That's it :)

Since this and the Social Smell Cheat Sheet were also your work, if not primarily your work, unless you wish not to, you should include your name to the left of mine in:

CC BY SA Carlos Paradis • Learn more with the dv8_showcase.Rmd vignette • Kaiaulu package version 0.0.0.9600 (in development) • Updated: 2023-04

I.e.

CC BY SA Leilani Reich, Carlos Paradis • Learn more with the dv8_showcase.Rmd vignette • Kaiaulu package version 0.0.0.9600 (in development) • Updated: 2023-04

and the equivalent to the Social Smell cheatsheet. Otherwise, there is virtually no trace that you made and applied the revision to both of them. Additionally, I made a separate repository for them, so it stays noted there too it was your contribution to the project. Could you create an issue to https://github.com/sailuh/kaiaulu_cheatsheet and send a Pull Request to it using the usual i #1 format for Kaiaulu?

Thanks for all the work on this.

leilani-reich commented 1 year ago

Should the social smells cheatsheet and the dv8 cheatsheet be separate issues?

Also did you want me to add the version of the social smells cheatsheet with or without the bugzilla additions here. I wasn't too sure on what I added, especially for the configuration fields since we haven't made a bugzilla conf section.

carlosparadis commented 1 year ago

One issue is fine. Seems I completely missed your version with Bugzilla.

For now, PR the one without Bugzilla, since we agree on the format for that one. I will review the Bugzilla later.

Also, when you PR, include both the .pptx of both cheat sheets, and also the .pdf version of them. I will use the PDF ones to refer on the new README under a Cheat Sheet section, borrowing from how R data.table presents it.

leilani-reich commented 1 year ago

I don't think I can add to an empty repo. It doesn't let me pull request. Can you add a readme perhaps? Also, it doesn't let me assign myself to the issue.

And do you want these cheatsheets in a folder like "cheatsheets/"?

carlosparadis commented 1 year ago

Sure, that works! Also I just pushed a README to the repo so you can add them. Apparently GitHub now call them main instead of master.

leilani-reich commented 1 year ago

I just made a pull request for this here.

carlosparadis commented 1 year ago

I am closing this since the last commit associated to this for Milestone 3.4 was pushed: cd742b348f7f185be163e7412be8cd6e18f1a981

For the record the cheat sheet can be found in the other repo and was pushed on issue https://github.com/sailuh/kaiaulu_cheatsheet/issues/1 and commit 973ea6d691353d58b4ab2549154250a52a85273a

sailuh / kaiaulu

DV8 Integration Milestone Issue + Cheat Sheet #165

git commands:

converter commands:

3.1 Figure the chain of commands

3.2 Understand the files they generate

3.3 Wrap them into R functions

3.4 Create custom DSM functions for Kaiaulu

3.5 Bonus

Here are the commands and what I ran for each of them:

scm:history:gittxt:convert-matrix:

arch-issue:arch-issue:

metrics:decoupling-level:

dr-hier:dr-hier

office:export-spreadsheet

Exporting .dsm matrix:

Exporting .clsx cluster:

EDIT - More Commands

To get file from Depends

dv8-console core:convert-matrix

dv8-console core:merge-matrix

dv8-console arch-issue:arch-issue

Overview

The Issue

How to

DV8 Disclaimer

About

Project Configuration File

Required Fields

Functions (Middle Column of the .pptx)

Related Vignettes

Functions (Right Column of the .pptx)

Final remarks