thorwhalen / umpyre

Materials for python coaching
MIT License
1 stars 0 forks source link

Bulk operations in collections #39

Open thorwhalen opened 3 years ago

thorwhalen commented 3 years ago

The original question was; how to make a py2store interface for mongo update_many (see this or that)

Translate this to: What would the most natural builtin (as in collections.abc) python way or data-scientist intuitive way to do this be?

What will be used in the backend, for mongo, is the method:

>>> from inspect import signature
>>> from pymongo.collection import Collection
>>> signature(Collection.update_many)
<Signature (self, filter, update, upsert=False, array_filters=None, bypass_document_validation=False, collation=None, hint=None, session=None)>

Let's write that as:

update_many(self, query, update_specs_mapping, ...)

to not clash with other names (filter and update).

We are thinking of mongo here, but want a mechanism to solve the problem in general (for local files, sql, s3 backends).

If s is an instance of the store, here are some proposals for the interface:

assignment interface

s[query].update(update_specs_mapping)
s[query] = update_specs_mapping

support for s[query] = update_specs_mapping

The last s[idx] = val form is reminiscent of numpy (or even the richer pandas) interfaces.

a_list = [1, 2, 3, 4]
an_array = np.array(a_list)

# a_list[[1, 3]] = 10  # does not work
# but:
an_array[[1, 3]] = 10  # does
# and so does
idx = np.arange(len(an_array)) < 2
an_array[idx] = 100

# note that right side (value assigned) can be a single item, or an iterable of the same size as the index:
an_array[[1, 3]] = [1000, 3000]

pandas has even richer indexing.

context manager interface

with s:
   for x in s[query]:
       x.update(val)

Would be nice to be able to have normal behavior of s outside a context, but the Command Pattern inside a context. Is this desirable from an architectural point of view?

Intermediately:

s[k] = v  # executes immediately
with AggregateWrites(s) as ss:
    ss[k] = v
    ...
# will aggregate and execute on exit

Other proposals (but not mutually exclusive with the above)

s[k] = v
s.hold()
s.append(v2)
s[k] = v3
del s[k2]
s.execute()
s[k] = v
with s.hold():
    s.append(v2)
    s[k] = v3
    del s[k2]

Notes

thorwhalen commented 3 years ago

See: https://github.com/i2mint/mongodol/blob/d4e65280c660e1f40dae2933cfdfadcb3c91ec3e/mongodol/tracking_methods.py