rethinkdb / docs

RethinkDB documentation
http://rethinkdb.com/docs
Apache License 2.0
117 stars 167 forks source link

Clarify Append Dissociated from DB #1097

Open cefn opened 8 years ago

cefn commented 8 years ago

If I understand correctly, and from my testing, then... https://www.rethinkdb.com/api/javascript/append/ ...does nothing to manipulate the database at all.

This behaviour should be made explicit, as it is only implicitly demonstrated e.g. in the https://www.rethinkdb.com/api/javascript/update/ example, where the result of append has to be reassigned to the member field, but the dissociation between e.g. r.table("users")("arrayChild") and the backing database when using Array manipulation is not really stated in the documentation.

Does r.table("users")("arrayChild") sometimes map to the backing data structure, (e.g. it can sometimes be used to make changes in the database) and sometimes not? Alternatively is it always 'read only' until it is used as part of a merge specification to update().

Good to get this stated, and if there's a central place where it's stated there could be links from the pages of the relevant operations (e.g. append, insertAt etc)

chipotle commented 8 years ago

None of the commands under "Document manipulation" in the API index directly manipulate the database; only the commands under "Writing data" do that. You can think of this as being analogous to SQL -- you can manipulate the data being returned by the query to get things that aren't literally in the database, but those things don't get put back in the database unless you use UPDATE or INSERT.

@danielmewes - I'm not sure whether this needs to be more explicitly stated or not, and if so, what would be a good way to do it. The signatures and descriptions of commands like merge and append show them returning arrays or objects or such. Is this an issue that's come up before? (I think append and prepend and a few other very early documentation pages do need some better examples than they have.)

danielmewes commented 8 years ago

This has definitely come up before, and I can see understand where the confusion comes from.

Some commands are more prone to this confusion than others, especially append, prepend, insertAt, deleteAt and changeAt I think.

Could we have a note on these commands that states that the commands only return the changed value, but don't write it back to the table unless used with update or replace?

cefn commented 8 years ago

Be glad of the clarification, but that wording already misleads that the operations might do some updating or inserting when embedded in update, and they don't, although potentially they could. I found myself experimentally writing, for example...

r.table("christmassongs").get("Twelve Days").update(r.row("sequence").append("Two Turtle Doves")).run(conn);

...as would be suggested by the idea that when 'used with update or replace' they manipulate the row fields, which I don't think is true in this form.

Of course it would be feasible to add explicit support for this as an explicit way to overload update(...), causing document values to be manipulated, but I understand you want to keep r.row() as a retrieval-only mechanism.

Is the efficiency of the following as bad as one might fear...

r.table("sesame street").get("the count").update({
   pencils:r.row("pencils").append("One Million"),
}).run(conn);

...or is it doing some smart work in the planner to avoid literally retrieving a million array entries and writing them back?

danielmewes commented 8 years ago

@cefn Good points, we need to be careful about how to phrase this.

RethinkDB updates are always on a per-document granularity, so it will definitely rewrite the whole thing. This is kind of hard to do differently in a document database without any schema, so we haven't gotten around to it so far. The issue to follow for this functionality is https://github.com/rethinkdb/rethinkdb/issues/5369 . For the time being, our recommendation is not to use embedded arrays that are too large if they will be frequently updated. Instead we recommend using a separate table and a join, as you would do with a relational database.