hunt-framework / hunt

A flexible, lightweight search platform
59 stars 10 forks source link

Semantic: Insert Update Delete #81

Open chrisreu opened 10 years ago

chrisreu commented 10 years ago

There was some confusion about semantic of insert, update and delete concerning index and doctable.

I guess we've never really defined this, at least not written it down.

The idea of this ticket is to discuss the current state and define how it should work in the future.

Here are my thoughts for the DocTable:

DocTable

I think the document table should get a rest-like interface. This interface should expose the document table in a way that it is possible to retrieve single documents without the need of a query. Think: GetDocumentByUri

Insert would still happen with index inserts, but the interface could provide operations for update in a restful way:

So with update it would be possible to actually remove properties from the document (which is not possible with the current API). Patch could be used by Hayoo to update the Weights, by just patching the Weight field.

sebastian-philipp commented 10 years ago

Another idea, that doesn't i introduce a new api would be a virtual uri: context, that is accessible like any other context and can be integrated into queries.

Such a query would look like

/search/uri:document-uri

This looks quite as clean as a rest api. Replacing the document description could be done by a new command replace.

Am 18.06.2014 um 17:28 schrieb chrisreu notifications@github.com:

There was some confusion about semantic of insert, update and delete concerning index and doctable.

I guess we've never really defined this, at least not written it down.

The idea if this issue is to discuss the current state and define how it should work in the future.

Here are my thoughts:

DocTable

I think the document table should get a rest-like interface. This interface should expose the document table in a way that it is possible to retrieve single documents without the need of a query.

Insert would still happen with index inserts, but the interface could provide operations for update in a restful way:

update: Updates the whole resource patch: Updates parts of the resource So with update it would be possible to actually remove properties from the document (which is not possible with the current API). Patch could be used by Hayoo to update the Score, by just patching the Score field.

— Reply to this email directly or view it on GitHub.

chrisreu commented 10 years ago

That may be possible. But in my opinion, that would not be a clean solution. Neither on implementation side nor from a users point of view.

It would not only duplicate the URI in memory, but create a whole map, basically from URI to URI containing redundant data.

Also the lookup of a single document would just be more inefficient then necessary. Instead of a literally lookup in the document table the engine would need to run through all steps required for search (query parsing, query processing, computations of the hits).

There is only one scenario were this would make sense. That would be, if we store all the documents in the context as well and get rid of the current DocTable abstractions. The key would the URI, the value would be the document. I think that might be something to consider after a first release.

UweSchmidt commented 10 years ago

I think a single update command is sufficient. When the ApiDocument does not contain an index part, the index update can be skipped and the operation becomes cheap.

The description fields from the ApiDocument simply overwrite existing fields or they are added. The values of the description fields are generalized form Text to JSON values. Deletion of a field can be implemented by associating that field in the ApiDocument with the JSON Null value. The NUll can be used as indicator for deletion, Null values in the doc descriptions seem to be redundant.

With this approach we don't need any change to the interface and don't need any new commands.

chrisreu commented 10 years ago

I like this idea.

The update command pretty much behaves like this right not, doesn't it? I'm not familiar with the new DocDesc structures yet, but may it be possible to integrate the "delete on NULL" directly into this structure?

If we want a restful interface in the future, that supports insert,update and patch with restful semantics, we could still do this on top of the interpreter interface.

chrisreu commented 10 years ago

I'm still not a 100% satisfied with the current semantics. We've got 3 operations now for manipulation. Insert and Delete pretty much do what everyone would expect. Update still feels a little inconsistent in my opinion. Here is why:

Current Update DocTable Index
Attribute/Context is given Attribute gets overwritten with new Value New values get appended to Context
Attribute/Context is not given Nothing happens Nothing happens
Attribute/Context is NULL Attribute gets removed Nothing happens

Update on the Index is more like an append operation then an update. Here is what i think Update should work like:

Proposed Update DocTable Index
Attribute/Context is given Attribute gets overwritten with new Value All words indexed for this document get removed. New words get indexed. Basically the Context gets overwritten in regard of this particular Document
Attribute/Context is not given Nothing happens Nothing happens
Attribute/Context is NULL Attribute gets removed All words for this Document get removed from this particular Context. So basically the whole Document would be removed from this one Context

IF the current behavior for the Index is still needed somewhere, we could easily keep that by adding an Append Command. This new command could consistently append things to DocTable and Index like so:

Proposed optional Append DocTable Index
Attribute/Context is given Value gets appended to Attribute new values get appended to Context, while old words and positions are preserved
Attribute/Context is not given Nothing happens Nothing happens
Attribute/Context is NULL Attribute gets removed All words for this Document get removed from this particular Context. So basically the whole Document would be removed from this one Context

What are your opinions on this?