Closed SoniEx2 closed 5 months ago
This is because it read all file
I captured strace file to make shure of this
access(".../objects/41/043eaf7a378789fc54d5e7ccd5f7b878a9dba7", F_OK) = 0
newfstatat(AT_FDCWD, ".../objects/41/043eaf7a378789fc54d5e7ccd5f7b878a9dba7", {st_mode=S_IFREG|0444, st_size=1822988139, ...}, 0) = 0
openat(AT_FDCWD, ".../objects/41/043eaf7a378789fc54d5e7ccd5f7b878a9dba7", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 1822990336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffa38b2e000
read(3, "x\1Tzct\245M\320m\214\311\304\2669\261}b\333\266m\333\2661\261&\236dbs2\261"..., 1822988139) = 1822988139
close(3) = 0
read syscall show that 1822988139 bytes have been readed!
Same problem with attribute .is_binary in the Blob object type
Please don't ignore that problem @jdavid
In the sample code above try replacing:
blobmeta = todoblobs.setdefault(obj, [])
With:
blobmeta = todoblobs.setdefault(obj.id, [])
What happens is that to get obj.size
the libgit2 object is loaded, and pygit2 keeps a reference to it in obj
. This reference will be freed when obj
is destroyed. But by keeping it in todoblobs
this won't happend until the end of the program.
When scanning a repo through such means:
visiting
obj.size
at the given point is the difference between getting killed by the oom_killer vs using only about 160MiB of peak RAM.