Open buriy opened 5 years ago
I was just about to make a separate issue for this, but I'll roll it into this one.
I have been thinking about this issue a bit and this is the solution I've come up with:
Method 1: We set an order for slicing, and then use regular slicing. N.B: We set this order as a variable of the named-tensor, we don't do any transposing of the tensor.
e.g. We can set an order by using t['name1','name2,'name3']
and then access using t[a, b, c]
which returns the equivalent to t[ { 'name1': a, 'name2': b, 'name3': c }]
.
This can be used in a number of ways:
t1 = t['name1','name2','name3'][a, b, c]
t1 = t['name1','name2','name3', a, b, c]
t1 = t['name1', a, 'name2', b, 'name3', c]
Plus we can set once and then treat our tensor as a normal tensor for a while:
t = t['name1','name2','name3']
t1 = t[a, b, c]
t2 = t[d, e, f]
Or maybe something like:
with t.order('name1','name2','name3'):
t1 = t[a, b, c]
t2 = t[d, e, f]
Method 2:
We abuse slicing notation and allow:
t1 = t['name1': x, 'name2': y:z, 'name3': w]
In this case we have three slices where the start
member of each slice is set to the name string. This frees up the stop
and step
parts of the slice for our use, which we can treat as start
and stop
for our actual slicing. This does remove the capability to use the step
component, although that currently isn't implemented anyway, and can be achieved through other means.
Example:
assert t['name1': 3, 'name2': 2:5, 'name3': -1] == t[{'name1': 3, 'name2': slice(2,5), 'name3': -1}]
Both these methods can be used together, although there's some question as to certain minor details of how this would work, e.g. what should the behaviour of t['name1', 'name2', 1, 'name3': 3, 2]
be.
If you want to have a play around, I have a basic implementation here: https://github.com/EndingCredits/namedtensor
@EndingCredits Using raw names still isn't good. You seemed to ignore all important points in my message. Someone will write t['neme1'] and it won't work, because it's t['name1']. t[t.name1] is much better, and allows operations like t[t.name1>5] or t.name[1:5]. And your t['name1': 3, 'name2': 2:5, 'name3': -1] even isn't a valid python code ( list indices must be integers or slices, not tuple ). There are two good sources to look for similar well-known solutions: Django API and SQLAlchemy API. Pandas used SQLAlchemy API (and did it lousily).
I was't really commenting on your suggestion, just rolling the discussion into one thread.
Also, looks fine to me
@EndingCredits it is much better when linter/compiler warns you about this, instead of at run-time.
Thanks for the suggestions, this is helpful and I hear the two different approaches.
I agree the dict construction approach is a bit heavyweight. particularly in loops. I need to look into whether that is actually a problem though from a memory standpoint.
I see why the x.batch approach is appealing in a an editor (if you somehow declare the names). However [x.batch == 1] seems like a strange way to do constant time indexing, unless you see that being lazy somehow.
@srush for that you might have x.batch[1] as well, and anyway it's no worse than x[{"batch": 1}] .
Memory-wise I imagine the overhead of constructing dicts is negligible compared to the actual tensor operations, unless somehow a tensor itself is being copied.
More importantly, the current way of doing thing is just very clunky, and a bit unintuitive. I feel any of the methods above would be a big improvement to usability.
It doesn't look well, no IDE hints, and constructs a new dict every time.
Would you consider pandas-like interface instead?
x[x.batch==1]
andx[x.batch>=1 && x.batch<4]
syntax looks much better and allows for IDE hints.x.batch[1]
andx.batch[1:4]
also can be used for this specific case -- that would be amazing!