Refactor Git modules - Githubissues

daveaglick commented 8 years ago

The Git modules could use a little refactoring before release. Here's what I have in mind:

Combine them all into one module, Git with fluent methods for each type of information: WithAuthors(), WithCommits(), etc. The output documents would be an aggregate of the different types of requested information (with a metadata key to identify which type of information the document contains). This will allow consolidation of logic and should be more efficient when getting multiple types of information (for example, the commit history for two different files).
Specify the location of the Git repository in the constructor of the module instead of assuming InputFolder is where we want to look. This will let us work the Git module into a bigger generation that isn't just about Git data. You can still use the InputFolder by passing it to the constructor using a ContextConfig delegate.
Flatten the metadata so that instead of using the CommitInformation and Author container objects we place the metadata directly in the output documents as standard .NET primitives. This will make it easier to consume from downstream modules and templates.
Modify the behavior for getting commits from a specific file so that we specify the file using a relative string path from the root of the repo or using a DocumentConfig delegate to supply the path for each input (which would still allow the current behavior of looking at IDocument.Source).

@LokiMidgard - I'm prepared to go ahead and make the changes, they shouldn't take long (you've already done the heavy lifting). Just wanted to get your thoughts before I do.

LokiMidgard commented 8 years ago

Currently there are 4 different Modules two of them use input documents and add Metadata to those the other two just create documents from the repo information ignoring the input. I'm not sure if we should mix those two behaviors. Maybe we could make two modules instead of one. One Module to add Metadata to existing Documents, the other to create new Documents.

For the second point, I think we should make an overload where you can specify the folder but using a default if not. If we would use two Modules the default for the Git modules that create new documents could be InputFolder and for the Module that adds Metadata to files it could be the folder of the current document. Maybe I'm missing a use case but I don't think you could get any information from a git repo of a file that is not in a subfolder of that repo.

daveaglick commented 8 years ago

Ah, I see. GitFileCommits and GitContributor both add metadata with a collection to the input documents, with the Source of each input document serving as the indication of which file commits/contributors to get. I'm not opposed to having two modules as you suggested, but what if all four operations cloned and output the input documents?

GitFileCommits and GitContributor would be the same as they are now and GitAuthors and GitCommits would add a metadata value that contains the collection of authors/commits to each input document (the same collection if we use the default source of InputFolder or a different collection if we use a DocumentConfig and change the Git repo we look at per-document). We'd no longer be able to "flatten" the CommitInformation and Author objects since they're going into a collection in an existing document, but I like this new idea better anyway. It would let us use the input documents as a control if we want different Git repos for example, and would cleanly associate aggregate information about those repos with the input document(s).

I'm fine keeping the default as InputFolder - the overloads can be used if someone wants an alternative. Will this work if the root of my repo is one level up (for example, I store my entire site in Git including my wyam.config and I want to get the history for a specific Markdown file in my "Input" folder)? I.e., is libgit2sharp smart enough to open a repo if you give it a folder that's nested inside the repo?

LokiMidgard commented 8 years ago

GitAuthors currently create a document for each Author containing the name and an collection of all files he contributed to in form of CommitInformation. The reason was to create a detailed page for every contributor that contains the files he contributed to. If we restructure the Data that it is more navigatable this would still be possible just with the Collection of Authors.

Maybe something like this: class-02

With all relationships between this classes bidirectional.

For the last part you are right. That's how I use it currently. It starts the search at InputFolder and moves up until it found a git repo.

daveaglick commented 8 years ago

So I finally got some time to look at this and ended up doing a pretty big refactoring. Hope you don't mind! I started out with some small changes, but new ideas kept coming to me about ways to make it more generally applicable and one thing led to another. There are now two modules, one for getting commits and another for getting contributors (which can be configured to get committers, authors, or both). For each module the user has the option to get all commits/contributors for the repository (which will generate new documents) or get ones specific to the input document (which will add metadata to the input documents). I also removed the special classes and relied on nested documents to contain the data - this approach has worked really well in the code analysis module because you can then pass metadata of one of the top-level documents (such as the list of commits for a specific file) to other modules since it's just a sequence of documents itself.

LokiMidgard commented 8 years ago

Can't wait to try it out :)

statiqdev / Statiq.Web

Refactor Git modules #130