jur9526 / couchdb-python

Automatically exported from code.google.com/p/couchdb-python
Other
0 stars 0 forks source link

Reduced documents loose their id with schema views #109

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Use a reduce function that reduces to a single document

We have a schema view with the following map/reduce functions (python view)

    def map(doc):
        if doc['type'] == "Result":
            yield (doc['user_id'], doc['event']), doc

    def reduce(keys, values, rereduce):
        return min(values, key=lambda doc: doc['value'])

Thus we want to find the single document with the minimum value

2.The id of the reduced document gets lost even if it's included in the result

    {
        "key": ["3df2a23a4c0549265b6e808f3b61b30e",  "myevent"],
        "value": {
           ....
            "_id": "dbfdaa255af4050533fbed33ad43bb1f"
        }

The row does not have an id because it's a reduce result, but the document 
value still contains 
the id as _id

3. Schema View sets id to None

Since we use a Schema View to wrap our resulting document the wrapper function 
will be used 
(line 273 in schema.py)

            def wrapper(row):
                if row.doc is not None:
                    return cls.wrap(row.doc)
                data = row.value
                data['_id'] = row.id           <--- overwrites the existing _id with None
                return cls.wrap(data)

The document value contains the _id of the document, but the wrapper function 
overwrites it 
with None since it was a reduce result. 

What is the expected output? What do you see instead?

The wrapper function should not overwrite the _id if it exists in the document 
(row.value). Thus 
something like this should do the trick:

                data['_id'] = row.id or data.get('_id')

Thus we trust the row.id if it exists, but use the data _id if it exists and if 
there is no row.id

What version of the product are you using? On what operating system?

0.6.1

Please provide any additional information below.

With the current implementation the id gets lost, and to get around it I have 
done the very bad 
hack of modiying the doc within the map function doc['id'] = doc['_id'], so I 
can still find the id in 
the result.  I think it's a mistake to think that reduce results does not have 
an id. In most cases 
they don't, but sometimes you do not want to reduce to a single value, but to a 
single document, 
and in that case it's nice to know the id of the document.

Original issue reported on code.google.com by dbrat...@gmail.com on 20 Dec 2009 at 8:41

Attachments:

GoogleCodeExporter commented 9 years ago
Can you also provide a test case?

Original comment by djc.ochtman on 24 Dec 2009 at 9:56

GoogleCodeExporter commented 9 years ago
As the original poster stated, reduce result rows don't have a doc id. So, I 
don't 
think wrapper() should ever overwrite anything in the value, i.e. it shouldn't 
update 
with _id=None either.

Does this version of the patch have the same effect:

                if row.id:
                    data['_id'] = row.id

This might be better as a different ticket, but I'm not sure wrapper() should 
try to 
construct a schema document from the value of a reduce or view. Shouldn't a 
schema view 
ensure incluce_docs=True is set and then fail for reduce views (because row.doc 
is 
None)?

Original comment by matt.goo...@gmail.com on 24 Dec 2009 at 11:55

GoogleCodeExporter commented 9 years ago
I find it hard to write a test case since I don't yet understand why the 
wrapper needs to set/overwrite the row 
value. I see 3 cases. 1) For include_docs=True, row.doc is used instead of 
doc.value, so there's no problems 2) A 
full doc is included as the value (my case), and there is no need for 
overwriting the id in the value 3) A partial 
doc without the id is included as the value. When is this case 3 actually 
needed? 

Original comment by dbrat...@gmail.com on 25 Dec 2009 at 8:26

GoogleCodeExporter commented 9 years ago
I suppose, issue still actual?

For example we have this view map function:

function(doc){
  emit(doc['_id'],1);
}

On execution, there will be an error at mapping.py@414
data = row['value']
data['_id'] = row['id'] # TypeError: 'int' object does not support item 
assignment

because, row['value'] is not doc object and never will be it for all cases. 

Reducing view makes another error at the same line:
KeyError: 'id' in mapping.py@414
because there is no `id` key in reduced view.

I suppose, there must not be mapping reduced view result to Document schema by 
default. The only way if include_doc sets as True - this options makes us sure, 
that we always will get full document, not only abstract parts of it.

Original comment by kxepal on 27 Jun 2010 at 1:56

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub. Please continue discussion here:

https://github.com/djc/couchdb-python/issues/109

Original comment by djc.ochtman on 15 Jul 2014 at 7:17