Open brittainhard opened 9 years ago
I would suggest namespacing the URL too.
If the crawl space is foo
and the crawler is called bar
then you could access it as
http://explorer.continuum.io/explore/foo/bar
something like that.
The easiest thing here is to control the name of the index as its created. For both nutch and ache we can supply a custom nam. We can have the name of the index reflect its related project.
After this, It seems to me that I can append some key/values to the index indicating creation date, index type (crawl or dataset), and crawler type. This seems like a fairly straightforward and quick way to do it. What do you guys think @tonyfast @ahmadia @kriehl
As for changing the foreign key relations of crawl, crawlmodel, etc, that seems like a separate issue.
@kriehl @ahmadia
So I’m working on this pr: https://github.com/memex-explorer/memex-explorer/pull/647
I realized that the simplest solution would be to just add the crawler type into the name, as well as the project name. Right now it looks like this:
@property
def index_name(self):
return "%s_%s_%s" % (self.slug, self.project.slug, self.crawler
This is added when the index is created. Aron had the idea of creating a separate index that contains info about each index we create and its associated project and crawl.
This is really the simplest fix I can come up with to this problem. Let me know if this is sufficient.
My only comment on this was the danger of being unable to filter properly due to an incomplete separation of fields.
I don't fully understand how filters work in ES, but my initial idea would be that it would be easier to add this information into a separate "meta-index" of crawl information. If the information is only contained in the index name, then filtering becomes a bit sloppier, since projects and crawls can be named anything, so potentially if somebody had "ache", "nutch", or "dataset" in their project or crawl name it would make it harder to filter these types of indices.
Right now every crawl and dataset name must be unique because its name is not associated with any project. We can make it so that you can have crawls and datasets by the same name in different projects by basing the namespace off of the project name.