namedtensor / notation

108 stars 5 forks source link

Slice notation #55

Open davidweichiang opened 3 years ago

davidweichiang commented 3 years ago

Do we want to be able to write A_{ax(i:j)} to take a slice of axis ax? And if so, should j be inclusive?

davidweichiang commented 3 years ago

And should it be written as :, … (\ldots), ⋯ (\cdots), or .. (no LaTeX symbol that I'm aware of)?

davidweichiang commented 3 years ago

And remember that we defined axes to be able to range over any set of indices, not just 1,...,n. Should it be possible for a slice to be any subset of the index set?

srush commented 3 years ago

I'm still not convinced on slicing, but if we have it then it should be inclusive if we are 1-indexed.

This indexing issue came up when working on the pytorch docs. Since they are 0-indexed, I had to be 0-indexed as well. This is an issue with standard matrices as well. I feel like the best answer is to never use indices when possible.

I like any of those styles.

davidweichiang commented 3 years ago

If we take a slice A_{ax(10,…,20)}, I assume we want the resulting slice's elements to be numbered 1,...,10, right?

If so, I think we probably shouldn't define unordered slicing -- e.g., if ax is indexed by word types, to be able to slice A{ax(dog,cat)} and the resulting slice has elements A{ax(dog)} and A_{ax(cat)}. If we wanted that, we'd have to define two different slicing operations, one which renumbers and one which doesn't.

So, the definition (if we want it) should probably be something like:

If A has an axis ax[I] and L is a list of elements of I, then A{ax(L)} = S where S{ax(i)} = A_{ax(L_i)}.

srush commented 3 years ago

Hmm, maybe we should just not have Nominal indexing? It brings up some nasty cases, and it seems not to have too much benefit. Defining a bijection is not too hard.

davidweichiang commented 3 years ago

Maybe... but is there anything wrong with the above definition? It allows slicing of any kind of axis, but the slice has to be numbered. I think that’s pretty reasonable.

On Jan 28, 2021, at 19:23, Sasha Rush notifications@github.com wrote:

 Hmm, maybe we should just not have Nominal indexing? It brings up some nasty cases, and it seems not to have too much benefit. Defining a bijection is not too hard.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

srush commented 3 years ago

Oh sorry I think I misunderstood your negations. I'm okay with that, but it's not really slicing right?

Isn't that sort of similar to advanced indexing? Or at least that's kind of how it works right?

We could also consider keeping the indexes of the slice. That's riskier and would require non-1 indexed dims.

davidweichiang commented 3 years ago

I think it's the same as slicing if L = i, ..., j. But yes, the more general case is similar to advanced indexing and could get weird.

I think we need renumbering in order to define unroll/pool.

I also don't think slicing is a must. The way that unroll/pool are defined now is not bad.

srush commented 3 years ago

I like this approach. I think the main reason contiguous slicing is important is for computational efficiency. If a slice is just a typed list that seems more general.

I wonder now though if it is better to go back to thinking of these as contractions with one-hot maps. I liked that idea, but the syntax was a tad complex.

davidweichiang commented 3 years ago

Yes, slicing can be thought of as contraction with a one-hot tensor, but you're right that the syntax was more complex.

Some possibilities:

image

The current notation (indices) is the most compact; I liked the slice notation because it more easily conjures up a mental picture of overlapping windows, but possibly only because I'm used to seeing i, ..., i+n-1 for n-grams.

The \delta notation for one-hot tensors could be useful for other things (e.g., defining advanced indexing and the identity matrix). But it clashes with the "names are wholes" principle.

srush commented 3 years ago

This is really interesting. To be more fair to the bottom two we should call it seq' on the top two as well right? I know it is okay to go to seq, but emerging best practice seems to be that function output should have different names, particularly if they change size?

My favorite is still the first. I'm not convinced about privileging slicing to imply a new axis, sorry. Cases that worry me: 1) dilated convolution (slice with skips), 2) transposed convolution (reversed slices). If you are doing indexing, I like it to be explicit.

The bottom one still really intrigues me. It feels like it best takes advantage of names, and it is really nice for explaining methods like EM where you want to illustrate the transition from indexing to one-hot to relaxed representations. Maybe there is a simpler syntax. We could reexplore @ctongfei 's idea to have an arrow subscript notation bijection building?

davidweichiang commented 3 years ago

To be more fair to the bottom two we should call it seq' on the top two as well right? I know it is okay to go to seq, but emerging best practice seems to be that function output should have different names, particularly if they change size?

I guess, but then unroll would need a third underscript to indicate the output axis...

Cases that worry me: 1) dilated convolution (slice with skips), 2) transposed convolution (reversed slices). If you are doing indexing, I like it to be explicit.

A slice with skips could be written i, i+2, ..., j, and a reverse slice could be j, j-1, ..., i. I think those are pretty standard.

All in all, though, I have to agree that after looking at these options together, the first one is the simplest.

davidweichiang commented 3 years ago

The indexing-with-bijection (#12) notation would look like

image
srush commented 3 years ago

I don't know if I like this but a benefit of that notation over indexing is that it looks nicer in an indexing tensor.
image

You could also imagine a bijective one, and a default case for identity matrices.

image

The slice ranges above are nice, but do you think it is obvious that they map in order to the a new name?

davidweichiang commented 3 years ago

What does the bold 1 mean?

The slice ranges above are nice, but do you think it is obvious that they map in order to the a new name?

To me it seems pretty self explanatory...not sure what others think.

srush commented 3 years ago

Oh I like the bold 1 for delta. But it could be delta. Was just trying to indicate that it was an indicator tensor.

davidweichiang commented 3 years ago

With the bold 1, what does the \rightarrow or \leftrightarrow mean? Wouldn't it be equivalent just to replace them with comma?

srush commented 3 years ago

This isn't fully baked. But I like the idea of these index maps. The idea was just to hint that this is a bijective (pool) or injective map (unroll). But it doesn't do anything.