Open davidweichiang opened 3 years ago
And should it be written as :, … (\ldots), ⋯ (\cdots), or .. (no LaTeX symbol that I'm aware of)?
And remember that we defined axes to be able to range over any set of indices, not just 1,...,n. Should it be possible for a slice to be any subset of the index set?
I'm still not convinced on slicing, but if we have it then it should be inclusive if we are 1-indexed.
This indexing issue came up when working on the pytorch docs. Since they are 0-indexed, I had to be 0-indexed as well. This is an issue with standard matrices as well. I feel like the best answer is to never use indices when possible.
I like any of those styles.
If we take a slice A_{ax(10,…,20)}, I assume we want the resulting slice's elements to be numbered 1,...,10, right?
If so, I think we probably shouldn't define unordered slicing -- e.g., if ax is indexed by word types, to be able to slice A{ax(dog,cat)} and the resulting slice has elements A{ax(dog)} and A_{ax(cat)}. If we wanted that, we'd have to define two different slicing operations, one which renumbers and one which doesn't.
So, the definition (if we want it) should probably be something like:
If A has an axis ax[I] and L is a list of elements of I, then A{ax(L)} = S where S{ax(i)} = A_{ax(L_i)}.
Hmm, maybe we should just not have Nominal indexing? It brings up some nasty cases, and it seems not to have too much benefit. Defining a bijection is not too hard.
Maybe... but is there anything wrong with the above definition? It allows slicing of any kind of axis, but the slice has to be numbered. I think that’s pretty reasonable.
On Jan 28, 2021, at 19:23, Sasha Rush notifications@github.com wrote:
Hmm, maybe we should just not have Nominal indexing? It brings up some nasty cases, and it seems not to have too much benefit. Defining a bijection is not too hard.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Oh sorry I think I misunderstood your negations. I'm okay with that, but it's not really slicing right?
Isn't that sort of similar to advanced indexing? Or at least that's kind of how it works right?
We could also consider keeping the indexes of the slice. That's riskier and would require non-1 indexed dims.
I think it's the same as slicing if L = i, ..., j. But yes, the more general case is similar to advanced indexing and could get weird.
I think we need renumbering in order to define unroll/pool.
I also don't think slicing is a must. The way that unroll/pool are defined now is not bad.
I like this approach. I think the main reason contiguous slicing is important is for computational efficiency. If a slice is just a typed list that seems more general.
I wonder now though if it is better to go back to thinking of these as contractions with one-hot maps. I liked that idea, but the syntax was a tad complex.
Yes, slicing can be thought of as contraction with a one-hot tensor, but you're right that the syntax was more complex.
Some possibilities:
The current notation (indices) is the most compact; I liked the slice notation because it more easily conjures up a mental picture of overlapping windows, but possibly only because I'm used to seeing i, ..., i+n-1 for n-grams.
The \delta notation for one-hot tensors could be useful for other things (e.g., defining advanced indexing and the identity matrix). But it clashes with the "names are wholes" principle.
This is really interesting. To be more fair to the bottom two we should call it seq'
on the top two as well right? I know it is okay to go to seq, but emerging best practice seems to be that function output should have different names, particularly if they change size?
My favorite is still the first. I'm not convinced about privileging slicing to imply a new axis, sorry. Cases that worry me: 1) dilated convolution (slice with skips), 2) transposed convolution (reversed slices). If you are doing indexing, I like it to be explicit.
The bottom one still really intrigues me. It feels like it best takes advantage of names, and it is really nice for explaining methods like EM where you want to illustrate the transition from indexing to one-hot to relaxed representations. Maybe there is a simpler syntax. We could reexplore @ctongfei 's idea to have an arrow subscript notation bijection building?
To be more fair to the bottom two we should call it seq' on the top two as well right? I know it is okay to go to seq, but emerging best practice seems to be that function output should have different names, particularly if they change size?
I guess, but then unroll would need a third underscript to indicate the output axis...
Cases that worry me: 1) dilated convolution (slice with skips), 2) transposed convolution (reversed slices). If you are doing indexing, I like it to be explicit.
A slice with skips could be written i, i+2, ..., j, and a reverse slice could be j, j-1, ..., i. I think those are pretty standard.
All in all, though, I have to agree that after looking at these options together, the first one is the simplest.
The indexing-with-bijection (#12) notation would look like
I don't know if I like this but a benefit of that notation over indexing is that it looks nicer in an indexing tensor.
You could also imagine a bijective one, and a default case for identity matrices.
The slice ranges above are nice, but do you think it is obvious that they map in order to the a new name?
What does the bold 1 mean?
The slice ranges above are nice, but do you think it is obvious that they map in order to the a new name?
To me it seems pretty self explanatory...not sure what others think.
Oh I like the bold 1 for delta. But it could be delta. Was just trying to indicate that it was an indicator tensor.
With the bold 1, what does the \rightarrow or \leftrightarrow mean? Wouldn't it be equivalent just to replace them with comma?
This isn't fully baked. But I like the idea of these index maps. The idea was just to hint that this is a bijective (pool) or injective map (unroll). But it doesn't do anything.
Do we want to be able to write A_{ax(i:j)} to take a slice of axis ax? And if so, should j be inclusive?