simonw / git-history

Tools for analyzing Git history using SQLite
Apache License 2.0
191 stars 18 forks source link

For some repos the file is not correctly read from each commit version #64

Closed simonw closed 1 year ago

simonw commented 1 year ago

https://pinafore.social/statuses/109671158910886602 reports problems loading the history for data/stats.json in https://github.com/Ambrosiani/GLAM_stats

Sure enough, I tried this:

cd /tmp
git clone https://github.com/Ambrosiani/GLAM_stats
cd GLAM_stats
git-history file stats.db data/stats.json --id name --branch master

Output:

  [####################################]  176/176  100%%                        GLAM_stats % 

But:

% sqlite-utils dump stats.db
BEGIN TRANSACTION;
CREATE TABLE [namespaces] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT
);
INSERT INTO "namespaces" VALUES(1,'item');
CREATE UNIQUE INDEX [idx_namespaces_name]
    ON [namespaces] ([name]);
COMMIT;

So it didn't actually work.

On a bit of debugging it turns out this code was at fault:

https://github.com/simonw/git-history/blob/8d1d6b708b0d7f99dd50b12b0c10d458855064b6/git_history/cli.py#L26-L34

The "This commit doesn't have a copy of the requested file" branch was being hit for every commit.

simonw commented 1 year ago

https://github.com/Ambrosiani/GLAM_stats/commit/7d4da65f4642a84e99c88d58af98a6352b32eb73#diff-e4d1dd595ec5e957bb51733508be30d458d23c324784c4078db4c17d15c92c84 is an example of a commit that does have a change to data/stats.json but for which the [b for b in commit.tree.blobs if b.name == relative_path] idiom is not finding that file for some reason.

I found an alternative code pattern that DOES read the file though:

commit.tree["data/stats.json"].data_stream.read()

It raises KeyError if the file cannot be found.

simonw commented 1 year ago

That change fixed the problem!

I'm going to skip adding a new test for this because I don't know how to build a repo that exhibits the problem as part of the tests, and the fix seems very safe to me.

simonw commented 1 year ago

Shipping this as an alpha release because I haven't yet got a good fix for: