As @unclebob (I think) said well: "What is used together, goes together. What changes together, goes together."
Use the Chameleon clustering algorithm (or possibly the BIRCH or Spectral Clustering algorithms) to identify what goes together based on both what is used together and what changes together, using the following for the proximity and interconnectedness calculations:
proximity / closeness --> number of times files have been committed together
interconnectedness --> edge weight between classes
As @unclebob (I think) said well: "What is used together, goes together. What changes together, goes together."
Use the Chameleon clustering algorithm (or possibly the BIRCH or Spectral Clustering algorithms) to identify what goes together based on both what is used together and what changes together, using the following for the proximity and interconnectedness calculations:
proximity / closeness --> number of times files have been committed together interconnectedness --> edge weight between classes
https://github.com/lbehnke/hierarchical-clustering-java may be able to be used as a starting point.