hunt-framework / hunt

A flexible, lightweight search platform
59 stars 10 forks source link

Updating document descriptions doesn't work as expected #70

Closed sebastian-philipp closed 10 years ago

sebastian-philipp commented 10 years ago
$ cat original.json 
{
  "cmd": "insert",
  "document": {
    "uri": "http://first-article",
    "index": {
      "title": "First Article",
      "content": "This in the first indexed article in hunt."
    },
    "description": {
      "title": "First Article"
    }
  }
} 
$ cat content.json 
{
  "cmd": "update",
  "document": {
    "uri": "http://first-article",
    "content": "This in the first indexed article in hunt."
  }
} 
$ ./server-cli.py make-schema original.json  | ./server-cli.py eval -
{"msg":"ok","code":0}
$ curl -X POST -d @original.json http://localhost:3000/eval
{"msg":"ok","code":0}
$ curl -X POST -d @content.json http://localhost:3000/eval
{"msg":"ok","code":0}
$ python ./server-cli.py search this | jq '.'
{
  "code": 0,
  "msg": {
    "result": [
      {
        "description": {
          "title": "First Article"
        },
        "uri": "http://first-article",
        "weight": 1
      }
    ],
    "count": 1,
    "offset": 0,
    "max": 20
  }
}
UweSchmidt commented 10 years ago
$ cat content.json 
{
  "cmd": "update",
  "document": {
    "uri": "http://first-article",
    "content": "This in the first indexed article in hunt."
  }
}

the document should have a key index or description, but has a key content and that is ignored in fromJSON of ApiDocument, so nothing really changes.

chrisreu commented 10 years ago

Correct. Update takes a regular ApiDocument. But that behaviors is strange nevertheless.

First of all, as a user, i would expect the update command to fail in some form - because i provided invalid input. That does not happen, because - despite of being invalid - the ApiDocument can be parsed by Aeson.

That is, because in an ApiDocument, only the URI property is required. All other properties are filled with empty default if not available. All other parameters just seem to be ignored.

So - is there an easy way to let fromJSON fail if there are 'too many' parameters?

Okay - let's say the ApiDocument is parsed with empty description and then the update is executed.

So - why is the description still the same as before? Shouldn't the description be empty after an update with an empty description?

UweSchmidt commented 10 years ago

The ApiDocument is parsed from a JSON Object. When parsing that object one can check the set of keys in that object and raise an error, if there are illegal keys.

I think, for an update command we use the rule, that fields are added or overwritten, but untouched fields remain as they are. This is at least useful, e.g. when adding a document rank.

UweSchmidt commented 10 years ago

Document update reworked. A description attribute can now be deleted by associating the JSON null value with that key. Example:

original document:

{
    "max": 20,
    "offset": 0,
    "count": 1,
    "result": [
        {
            "score": 0.75,
            "uri": "http://first-article",
            "description": {
                "content": "This is the first article with modified content.",
                "title": "First Article"
            }
        }
    ]
}

update command:

{
    "cmd": "update",
    "document": {
        "uri": "http://first-article",
        "description": {
            "comment": "the content should have disappeared",
            "content": null
        }
    }
} 

result:

{
    "max": 20,
    "offset": 0,
    "count": 1,
    "result": [
        {
            "score": 0.75,
            "uri": "http://first-article",
            "description": {
                "title": "First Article",
                "comment": "the content should have disappeared"
            }
        }
    ]
}

we see, the content has disappeared, the new field comment is added, and changed fields would have been updated.

This works sound, if we assure, that no null values occur as attribute value. So the insert command too has been modified, such that these now illegal null values are filtered out. And we can live without null values in document descriptions.