simonw / git-history

Tools for analyzing Git history using SQLite
Apache License 2.0
190 stars 18 forks source link

Tables not being created, only `namespaces` #57

Open lassebenni opened 2 years ago

lassebenni commented 2 years ago

Hi!

I am trying to run git-history on a repository containing a json file that has multiple versions over time ( hundreds of commits to the same file).

When I run git-history file some_data.db data/latest.json --branch master, it creates a some_data.db but there are no commits tables being created:

image

Only a single namespaces table with a single item is created..

image

Not sure what I am missing here?

Kind regards, Lasse

simonw commented 2 years ago

Can you share the repository, or a cut-down copy of it with just a few commits?

Is it possible that you need to use --branch main instead?

lassebenni commented 2 years ago

Hi @simonw ! Thanks for the quick reply.

This is the repository https://github.com/lassebenni/shortbet. I am trying to get the commits for relative path data/latest.json.

I added a print statement to the commit checking code in https://github.com/simonw/git-history/blob/b2f0274ea7135e1bb1dd366b059d4ab08c09c713/git_history/cli.py#L26


    for commit in commits:
        if progress_bar:
            progress_bar.update(1)
        try:
            for b in commit.tree.blobs:
                print(b.name)
            blob = [b for b in commit.tree.blobs if b.name == relative_path][0]
            yield commit.committed_datetime, commit.hexsha, blob.data_stream.read()
        except IndexError:
            # This commit doesn't have a copy of the requested file
            pass

And got:

image

The issue seems to be that the commit-tree-blobs do not contain the name of latest/data.json, even though the file is updated everytime.. Also the amount of commits found seem to be 215, while the actual amount of commits is:

image

Not exactly sure what the issue is here..

lassebenni commented 2 years ago

So this ended up being due to my data living in a subfolder, in this case data/results.json. The commit blobs were simply not found due to git-history currently only looking at the root folder and not nested folders.

52 Will solve this issue. I didn't notice the PR before so I ended up with a similar fix locally. =(

jhoogeboom commented 11 months ago

I just ran into the same problem and can reproduce the behaviour when having data in a subfolder.

pax commented 1 month ago

I still get the same behaviour (empty db) with a fresh install. Multiple jsons in different directories

https://github.com/gov2-ro/prometeu/https://flatgithub.com/gov2-ro/prometeu