trivio / common_crawl_index

Index URLs in Common Crawl
192 stars 48 forks source link

Cleanup, re-structuring and installation #16

Open wiseman opened 11 years ago

wiseman commented 11 years ago

This isn't really a pull request, but more of a heads up in case you are interested in merging.

My fork has the following changes:

I don't have a local copy of the index so I wasn't able to test cci_lookup in local mode.

I used the cci_ prefix for the executables to prevent name collisions if they're installed globally with pip etc.

wiseman commented 11 years ago

FYI I also changed the scripts to use gflags.py, in order to share flags like --index_path between executables.

srobertson commented 11 years ago

This looks really promising, and I'll be happy to merge. At the very least it will spur me to do the work I was planning on doing which was to merge the code that generates the index into this repo and separate the pbtree stuff into a standalone pypi package independent of the common_crawl_index.

Let me sort that out, and then we can revisit this.

On Mon, Mar 25, 2013 at 3:15 PM, John Wiseman notifications@github.comwrote:

FYI I also changed the scripts to use gflags.py, in order to share flags like --index_path between executables.

— Reply to this email directly or view it on GitHubhttps://github.com/trivio/common_crawl_index/pull/16#issuecomment-15416247 .

-- Scott

"There was a time when the internet answered all my questions. Now it just repeats them. - SDR"