Open wiseman opened 11 years ago
FYI I also changed the scripts to use gflags.py, in order to share flags like --index_path between executables.
This looks really promising, and I'll be happy to merge. At the very least it will spur me to do the work I was planning on doing which was to merge the code that generates the index into this repo and separate the pbtree stuff into a standalone pypi package independent of the common_crawl_index.
Let me sort that out, and then we can revisit this.
On Mon, Mar 25, 2013 at 3:15 PM, John Wiseman notifications@github.comwrote:
FYI I also changed the scripts to use gflags.py, in order to share flags like --index_path between executables.
— Reply to this email directly or view it on GitHubhttps://github.com/trivio/common_crawl_index/pull/16#issuecomment-15416247 .
-- Scott
"There was a time when the internet answered all my questions. Now it just repeats them. - SDR"
This isn't really a pull request, but more of a heads up in case you are interested in merging.
My fork has the following changes:
python setup.py install
. I've also added it to PyPi so you can dopip install commoncrawlindex
oreasy_install commoncrawlindex
.read
andremote_read
scripts into a singlecci_lookup
script that can be given a--index_path
argument to an s3:// URI or a local file path.cci_fetch
script that can download the contents of URLs using the index.I don't have a local copy of the index so I wasn't able to test
cci_lookup
in local mode.I used the
cci_
prefix for the executables to prevent name collisions if they're installed globally withpip
etc.