Open kvzn opened 4 years ago
This might be best to ask libgit2 itself since this library only wraps libgit2. I'm not personally familiar myself with an API to do this, but I don't have an encyclopedic knowledge of the API.
@kevinzheng I was looking for something similar and turns out revwalk is the way to go, its how TortoiseGit does it aswell: https://github.com/TortoiseGit/TortoiseGit/blob/master/src/TortoiseShell/GITPropertyPage.cpp#L369
here is a good reference issue in libgit2: https://github.com/libgit2/libgit2/issues/495
@kevinzheng although I am kind of intrigued to benchmark this approach against using the blame
functions: https://libgit2.org/libgit2/#HEAD/type/git_blame then each blame_hunk contains the commit and git_signature
which in turn contains a git_time
@extrawurst @alexcrichton would you pls take a look at my implementation? It looks like working, but I haven't tested the performance, and I don't know how to handle the commits with multiple parents, thank you!
#[derive(Debug, Deserialize, Serialize, PartialEq, Clone)]
pub struct Commit {
pub commit_id: String,
pub message: String,
pub time: NaiveDateTime,
pub author: Signature,
pub committer: Signature,
}
pub fn last_commit_of_file_or_dir(
repo: &Repository,
file_path: &str,
from_commit_id: Option<&str>,
) -> Result<crate::beans::Commit, AppError> {
let mut revwalk = repo.revwalk()?;
revwalk.set_sorting(git2::Sort::TIME)?;
match from_commit_id {
Some(from_cid) => match Oid::from_str(from_cid) {
Ok(oid) => revwalk.push(oid)?,
Err(e) => return Err(AppError::Git2Error(e)),
},
None => revwalk.push_head()?,
}
while let Some(oid) = revwalk.next() {
let oid = oid?;
if let cmt = repo.find_commit(oid)? {
let tree = cmt.tree()?;
let old_tree = if cmt.parent_count() > 0 {
// TODO: multiple parents???
let parent_commit = cmt.parent(0)?;
Some(parent_commit.tree()?)
} else {
None
};
let mut opts = DiffOptions::new();
let diff = repo.diff_tree_to_tree(old_tree.as_ref(), Some(&tree), Some(&mut opts))?;
let mut deltas = diff.deltas();
let contains = deltas.any(|dd| {
let new_file_path = dd.new_file().path().unwrap();
// File || Dir
new_file_path.eq(Path::new(&file_path)) || new_file_path.starts_with(&file_path)
});
if contains {
let c = git2_commit_to_our_commit(&cmt)?;
return Ok(c);
}
}
}
return Err(AppError::CommandError(format!(
"Failed to get last commit of file {}!",
&file_path
)));
}
fn git2_commit_to_our_commit(commit: &git2::Commit) -> Result<crate::beans::Commit, AppError> {
let message = commit.message().unwrap_or("").to_string();
let author = crate::beans::Signature {
user_id: None,
name: commit.author().name().unwrap_or("".as_ref()).to_string(),
email: commit.author().email().unwrap_or("".as_ref()).to_string(),
};
let committer = crate::beans::Signature {
user_id: None,
name: commit.committer().name().unwrap_or("".as_ref()).to_string(),
email: commit
.committer()
.email()
.unwrap_or("".as_ref())
.to_string(),
};
let time = git2_time_to_chrono_time(commit.time());
Ok(crate::beans::Commit {
commit_id: commit.id().to_string(),
message,
time,
committer,
author,
})
}
It appears that this is a widely requested feature - nearly every language wrapper has a feature request for it - e.g. https://github.com/libgit2/pygit2/issues/231. However, it's not implemented in git2 - here's the upstream feature request: https://github.com/libgit2/libgit2/issues/495.
Someone has contributed a custom implementation for the C# bindings, although I haven't looked at it in detail: https://github.com/libgit2/libgit2sharp/pull/963
I've rolled my own implementation, but it reports different timestamps compared to git log
for half of the files in the repo I care about.
For ease of testing I list the timestamps for all the files that ever existed in the repository, rather than attempting to filter further. Here's my code:
// Copyright 2021 Google, inc.
// SPDX-License-identifier: Apache-2.0
use std::{collections::HashMap, path::PathBuf};
use git2::{Commit, Repository, Tree, Error};
fn main() -> Result<(), Error> {
let mut mtimes: HashMap<PathBuf, i64> = HashMap::new();
let repo = Repository::open(".")?;
let mut revwalk = repo.revwalk()?;
revwalk.set_sorting(git2::Sort::TIME)?;
revwalk.push_head()?;
let mut newer_commit: Option<Commit> = None;
let mut newer_commit_tree: Option<Tree> = None;
for commit_id in revwalk {
let commit_id = commit_id?;
let commit = repo.find_commit(commit_id)?;
if commit.parent_count() > 1 {
// ignore merge commits because they touch lots of files
// without any of them being actually modified
continue;
}
let tree = commit.tree()?;
// check if this is not the very first commit, then we have nothing to diff
if let Some(newer_commit_tree) = newer_commit_tree {
let diff= repo.diff_tree_to_tree(Some(&tree), Some(&newer_commit_tree), None)?;
for delta in diff.deltas() {
let file_path = delta.new_file().path().unwrap();
let file_mod_time = newer_commit.as_ref().unwrap().time();
let unix_time = file_mod_time.seconds();
mtimes.entry(file_path.to_owned()).or_insert(unix_time);
}
}
newer_commit = Some(commit);
newer_commit_tree = Some(tree);
}
for (path, time) in mtimes.iter() {
println!("{:?}: {}", path, time);
}
Ok(())
}
Here's a (slower) reference BASH implementation using git log
that outputs the data in the same format for ease of comparison:
#!/bin/bash
git ls-files | while read FILENAME; do
TIME=$( git log -1 --format="%ct" -- "$FILENAME" )
echo "\"${FILENAME#./}\": $TIME"
done
The BASH version aligns with the output of git whatchanged --pretty='%ct'
, but my git2-based impl does not. git2-based implementation tends to report newer dates than those in git whatchanged
.
Fixes I've attempted:
%at
(author time, not committer time) in the BASH version, which made a slight difference for the worseI've tried filtering out merge commits, but that didn't seem to make any difference.
¯\_(ツ)_/¯
Edit: Ah, that's probably because I'm walking the commit log chronologically using git2::Sort::TIME
. If I instead walk them by parent links, it should work better.
Okay, this works:
// Copyright 2021 Google, inc.
// SPDX-License-identifier: Apache-2.0
use std::{cmp::max, collections::HashMap, path::PathBuf};
use git2::{Repository, Error};
fn main() -> Result<(), Error> {
let mut mtimes: HashMap<PathBuf, i64> = HashMap::new();
let repo = Repository::open(".")?;
let mut revwalk = repo.revwalk()?;
revwalk.set_sorting(git2::Sort::TIME)?;
revwalk.push_head()?;
for commit_id in revwalk {
let commit_id = commit_id?;
let commit = repo.find_commit(commit_id)?;
// Ignore merge commits (2+ parents) because that's what 'git whatchanged' does.
// Ignore commit with 0 parents (initial commit) because there's nothing to diff against
if commit.parent_count() == 1 {
let prev_commit = commit.parent(0)?;
let tree = commit.tree()?;
let prev_tree = prev_commit.tree()?;
let diff= repo.diff_tree_to_tree(Some(&prev_tree), Some(&tree), None)?;
for delta in diff.deltas() {
let file_path = delta.new_file().path().unwrap();
let file_mod_time = commit.time();
let unix_time = file_mod_time.seconds();
mtimes.entry(file_path.to_owned())
.and_modify(|t| *t = max(*t, unix_time) )
.or_insert(unix_time);
}
}
}
for (path, time) in mtimes.iter() {
println!("{:?}: {}", path, time);
}
Ok(())
}
A MIT/Apache licensed version can be found here.
Edit: although it looks like this code will miss files only touched in the initial commit. A solution can be found here.
It should be something like the command
git log --follow FILENAME
.revwalk
might work but would lead to tons of computing. Do we have another ways? thank you!