Closed KayGau closed 4 years ago
Can you be more specific?
.idx files record offsets, but they are useful for sweeps, not random acces
And yes, py library has very limited functionality.
have you looked at the lookup/README.md? bin/idx is the initial storage, then either offset (in All.sha1o) or content (in All.sha1c) are extracted or both
Also, the recent commit introduced the ability to set locations of the databases in the environment, perhaps you can create a script that generates it for woc servers
I am reading and understanding the lookup/README.md now. Maybe I missed something important? I will check it. Thank you !
This is the commit I mentioned related to specifying woc paths in api: https://github.com/ssc-oscar/oscar.py/commit/7dcf54a1948b6413668bf32cbcfb1b8448d0e1de
Just keep in mind that py is used only to access data, not to create any of the databases
I am deploying WoC in Pengcheng, but I'm confused about .idx file and sha1.tch file(such as blob_0.idx and sha1.blob_0.tch). According to the published paper in MSR (World of Code- An Infrastructure for Mining the Universe of Open Source VCS Data)(In III section D Data Storage), I think sha1.tch files use a git object's SHA as key and the object's offset in .idx file. The .idx file records its offset and size in .bin file. But after I see the code in oscar.py, it seems that it doesn't use .idx file at all? So I am confused what does .idx file and sha1.tch file do? Thank you!