sailuh / kaiaulu

An R package for mining software repositories
http://itm0.shidler.hawaii.edu/kaiaulu
Mozilla Public License 2.0
18 stars 12 forks source link

Parallel git log entity analysis #231

Closed nicolehoess closed 1 year ago

nicolehoess commented 1 year ago

For evolutionary analyses, the git history may be analyzed over several time windows. To speed up such analyses, I would like to propose a parallel version of the git log entity analysis notebook in vignettes/gitlog_entity_parallel_showcase.Rmd, which uses the Apache Spark git project configured in conf/spark.yml for demonstration purposes.

The notebook depends on the R doParallel and doSNOW packages to parallelize the iteration over multiple time windows in foreach-loops.

Note: Both R packages work for Unix-like systems only. So far, the notebook has been tested on machines with up to 16 cores.

carlosparadis commented 1 year ago

Thank you @nicolehoess! Please feel free to PR the code and I can offer more feedback :)