sailuh / kaiaulu

An R package for mining software repositories
http://itm0.shidler.hawaii.edu/kaiaulu
Mozilla Public License 2.0
18 stars 12 forks source link

Cheat Sheet of Fake Data Generator #239

Closed carlosparadis closed 10 months ago

carlosparadis commented 11 months ago

Should make one showcasing your efforts. Guidance on how to write one is here: https://github.com/rstudio/cheatsheets/tree/main#posit-cheatsheets

Please refer to the ones already in Kaiaulu README as an example.

What you want to showcase is a) the fake data generators git.R #227 , issue.R #228 , mail.R #238, b) the example.R using them, and c) the unit tests that use example.R. You are in essence showcasing how to do more advanced unit testing for Kaiaulu.

Consider the questions you had in grasping the concept on making fake data to write this document so that others can understand it more easily. Pay close attention to the level of detail you put on this cheatsheet in light of the guidelines on the github above.

Rubegen commented 10 months ago

Fake-data-generator-first-draft.pdf

This is the first draft of the cheatsheet that I drew up. I chose to break it up into the three steps. First section includes the functions we used to customize the fake data. Things like git_add() and git_commit(). The second section is all about example.R and how we used the functions from section 1 to create fake repositories, mail box, and jira issues. The final section is about the unit tests and testing the three parsing functions. I plan to add visual aids in the boxes I marked but this is just a rough general layout of how things are going to be placed. LMK if this looks okay or what improvements I can make for the next draft. Thanks!

carlosparadis commented 10 months ago

This looks pretty great :) Here's some change requests:

Assuming the cheatsheet can be observed as 3 columns, and using the dv8 cheatsheet pptx as reference on aesthetics:

Left Column

You can then use the remaining space to include the remaining information on the left, as there is no use of project configuration files here.

I am not sure if you would manage to include on the left column, but we should also include somewhere something of this effect:

parse_gitlog(data=example, tool = "Perceval") parse_gitlog(data=example, tool = "Codeface")

And then emphasize the example can compare the output of different tools to verify underlying assumptions. This is not immediate for someone to consider using them for so I would repeat here on the corner.

For our discussion reference if you have questions (but not to put on the cheatsheet), let's call this the "Utility of Fake Data Generator to Compare Tool Behavior".

Middle Column

Choosing one example of each of the APIs is a good start, but I think we will lose some interesting opportunities here that will highlight your efforts even further. For the first example, I would consider the parse_gitlog(example=example_two_branches,tool="Perceval"), and parse_jira(example=two_issues_with_replies). On the unit test, you could then mention the unit test of these two examples can be used to not only check Kaiaulu built-in functions behave as intended, but also tools Kaiaulu depends on behave as intended.

For your second example, I would emphasize your fake data generator can help unit tests warn users of change of expected behavior by the tool for using a different version. E.g. consider that you used parse_gitlog(example_two_branches,tool="perceval_1.0") and Waylon used parse_gitlog(example_two_branches,tool="perceval_2.0"). Same example, but different versions of Perceval. If the behavior of Perceval 2.0 changes how it parses branches, then Waylon unit test would fail. This would immediately prevent you both to spend time on different analysis and finding inconsistent output without knowing why.

The third example should be an integration test. In this case, your example function will call git.R, mail.R and jira.R and simulate an actual project that has development, communication and issue tracking. For this one, you can refer to a simulation.Rmd notebook that does various analysis off this fictitious project.

Right Column

It may be worth to make one more doodle before the ppt, or you can make the draft ppt and we can iterate from there.

Rubegen commented 10 months ago

Hi Carlos, thanks for the feedback! I've implemented the changes you suggested on this second draft. I will start working on the ppt version now as the third draft and implement any further changes on that version. Cheatsheet_second_draft.pdf

Rubegen commented 10 months ago

I just realized I misnamed some of the functions again in my second draft. I'll be sure to have their correct names on the next version

carlosparadis commented 10 months ago

This looks good! Just remember everything besides the #3 you are unit testing (the parse git log functions), and indirectly third party tools when the function depends on them. So make sure you include Unit Tests there. I think it is OK if you run out of space to not include a Example section, and only demonstrate them when discussing Unit and Integration tests. Afterall, their purpose is clearer when used alongside the unit tests. Your "Green text" also explains them at the start of the poster.

Consider the use of color when you type the name of these functions in the Poster. For example, you may want to color the example variables one cor, and the tool parameter in a different color. This will be easier on the eye of the reader to see the separation of the 3 modules you created. Of course, the functions on the bottom left would have also their own color.

The green text area could also be recolored to match them: E.g. git.R, mail.R, jira.R could be blue, and then all function calls on the bottom left of your poster are blue.

The example.R green line could be dark red, and then the example_two_branches parameters could be dark red, etc.

For extra attention to detail: Please check your poster is color-blind friendly. There should be pages online that could check that for you. You could add a tiny disclaimer to your poster saying it is color blind friendly too if you like. I don't think the previous group checked for that.

For the next iteration i'd get going on the .pptx!

Thanks for sending an update today!

carlosparadis commented 10 months ago

Correction to my own message above: * you are unit testing the parse_gitlog functions (and the parse_mbox, and the parse_jira functions).

Rubegen commented 10 months ago

Fake_data-cheatsheet-FINAL.pptx

carlosparadis commented 10 months ago

Give me a few to review. Also, see my .DS_Store comment on cheatsheet repo!

Rubegen commented 10 months ago

Yes, I will resolve the .DS issue in a bit. LMK what final tweaks I can include in the cheatsheet so that it can be included in the poster presentation. You can view the semi-final poster presentation here: https://docs.google.com/presentation/d/1c_RUYPla9HuHn1dCyHfVVasji4WUiwo2/edit?usp=sharing&ouid=106873323762810267472&rtpof=true&sd=true

carlosparadis commented 10 months ago

Fantastic, yes I will get back to you on this today. I think it is nice if you can include it there too :)

carlosparadis commented 10 months ago

@Rubegen can you enable comment permissions for me on the poster slides? That way I can give direct feedback. But something already caught my attention:

Could you use as title "Fake Data Generators". Every semester everyone wants to use Software Analytics Insights as title, but if everyone does that, then the poster titles are all the same!

Rubegen commented 10 months ago

Yes, I updated your permissions to allow for comments. If it still doesn't work you can try a right click on a part of the presentation and then choose "comment"

carlosparadis commented 10 months ago

@Rubegen

I think a few posters last semester included QR codes (may be imagining). Could you include this on your poster and the URL to the project's github? https://github/sauluh/kaiaulu qrcode_github com

Rubegen commented 10 months ago

OK, ill find a place to include these 👍

carlosparadis commented 10 months ago

@Rubegen

I sent the revised cheatsheet via e-mail alongside feedback. As noted there: Please PR it (.pptx and pdf), the poster and the final presentation (also .pptx and pdf) to the cheatsheet repo when ready. Double check the .pptx loads fine and is properly formatted since it is done on Google Slides. Thanks!

carlosparadis commented 10 months ago

Also as stated on e-mail: Please add the footnote with our names, and the copyleft statement as it was on the other cheat sheets.

carlosparadis commented 10 months ago

All the documentation deliverables were added to the data cheet archive for Sailuh, thank you!