ssc-oscar / lookup

A mirror of bitbucket.org/swcs/lookup
1 stars 4 forks source link

Blobs are missing? #14

Closed k----n closed 3 years ago

k----n commented 3 years ago

For example:

echo 0001f9fcc01ec8019657cfbc249e1a99f087908a | showCnt blob was run with the output:

no blob 0001f9fcc01ec8019657cfbc249e1a99f087908a in 0

There is a b2c mapping:

> echo 007966799d926dc238dca635e2fe80df6e0b48e0 | getValues b2c

007966799d926dc238dca635e2fe80df6e0b48e0;d1f2d24001d1e42a0f926e4f8555015caba07848

The commit d1f2d24001d1e42a0f926e4f8555015caba07848 links to these 2 commits:

audrism commented 3 years ago

Only text blobs are currently stored. What is "text" is defined by git.

A small fraction of text blobs may be missing, due to processing pipline (as with commits)

k----n commented 3 years ago

Thanks.

Is "text" the files where the line changes can be determined?

It seems like git either treats a file as text or binary? https://stackoverflow.com/questions/6855712/why-does-git-treat-this-text-file-as-a-binary-file

Is there a better way to determine the type of blob instead of changing the file and running git diff?

audrism commented 3 years ago

in a fork of libgit2 I have created classify command that determines the type of object: but you need an original git repo to use it. libgit2 probably has the actual algorithm (that probably relies on file name and content) to determine git definition of "binary"

k----n commented 3 years ago

Neat, looks like it's a function in the libgit2 library: https://libgit2.org/libgit2/#HEAD/group/blob/git_blob_is_binary

And was called in places like: