Open DLehenbauer opened 4 years ago
Intervals expressed as start/count
I like this because you can have statically typed singletons (where count
=== 1). There may be some ergonomic issues we haven't come across yet, but so far this seems like a good convention.
'col' as an abbreviation for column
I'm happy to follow the other libs and not abbreviate. (Maybe we can find a linter to enforce this? I know that it is easy to let a few slip through otherwise!).
'numRows' as an abbreviation for 'number of rows'
I like rowCount
, at least for domains that are fundamentally 2d.
Sigh. It looks like matrix.get([row, col])
is a ~15-20% perf hit for no good reason (where matrix is backed by a 1D array + trivial math.)
Guess we should continue open-coding coordinates as (row, col, rowCount, colCount)
.
I've been thinking about moving '.removeMatrixConsumer()' and similar to the IReader so that a typical transducer doesn't need to hold references to both the producer and the reader (which are often the same object in practice.)
One oddity is that the consumer still needs to pass a reference to itself, so that the producer knows which consumer to remove (again, because producer/reader are typically the same object.)
this.reader.removeMatrixConsumer(/* consumer */ this);
It's tempting to make removeMatrixConsumer an unbound function and infer the consumer instance via 'this', but probably not a good idea.
That makes sense; I guess it's really like a closeReader
method?
It's tempting to make removeMatrixConsumer an unbound function and infer the consumer instance via 'this', but probably not a good idea.
I'm not quite sure what this means. Do you have an example?
I was thinking there might be a way to abuse the 'this' behavior with function () {}
such that calling reader.close*() would implicitly pass the caller's this
to producer.close()... realized it was probably a bad idea before I got around to figuring out if it could work.
We chose non-conflicting APIs so that the same object can implement IProducer, IVectorProducer, and IMatrixProducer.
We should do the same with IReader, IVectorReader, IMatrixReader, since it's common for a producer to implement reader and return itself on open().
Something like renaming read()
-> get()
, getItem()
, 'getCell()`.
for matrix we've said we would use rowCount / colCount, which implies the number of dimensions... for maps vs. vectors I guess that would be entryCount
and itemCount
?
Our current Matrix/VectorConsumer interfaces currently communicate a move as a remove/insert, which might be suboptimal for a caching transducer. (e.g., reordering causes the transducer unnecessarily evict cached results.)
Originally, I*Reader was just an API-trick to ensure that anyone consuming values from a producer has subscribed for updates.
However, as we explore multiple views, I'm beginning to see the value of I*Readers as cursors that can cache subsets of the permutation/mapping.
This makes me think adding I*Writer probably makes sense, as you might want multiple caches when writing to distant regions of a vector, matrix, etc.
The above raises the question of how a single consumer acquires multiple cursors. They could call open() multiple times and the IProducer is responsible for coalescing duplicate subscriptions.
Another idea is that I*Reader() has a fork()
or better named method that creates a new reader for the same producer.
The IVectorConsumer.itemsChanged()
signature forces the producer to alloc an array with the newly inserted items. We shouldn't do that, as it forces a producer to eagerly load/calculate values without knowing if the consumer needs them.
We could allow simple transducers to pass subscriptions through to the producer(s) they are consuming, like this:
public openMatrix(consumer: IMatrixConsumer) {
// subscribe on behalf of the consumer, but throw away the reader.
this.source.openMatrix(consumer);
return this; // Provide our own IMatrixReader
}
This works when changes are conservatively 1:1 with the source.
It would require the following changes:
(Continuing to use this thread to capture misc. api thoughts... eventually, I should summarize & justify various decisions in spec.md or similar.)
On second through, passing subscriptions through to upstream produces might not be a great idea. In addition to the aforementioned need for refcounting, it also means that APIs like cellsChanged(..., producer)
will receive the upstream producer.
This is problematic if a consumer has subscribed to multiple produces of the same type and is using the 'producer' value to distinguish.
When combining R0C0 with a 2D geometry library, passing y/height before x/width feels unnatural.
cellsChanged(/* rowStart: */ rect.y, /* colStart */ rect.x, /* rowCount: */ rect.height, /* colCount: */ rect.width);
It tempts me to try to avoid the terms "row"/"col", but R0C0 terminology sure feels natural for insertRows, insertCols, etc.
The notifications for row/col changes in a matrix are the same as those for a vector.
It makes me want to do something like this:
interface IMatrixProducer<T> {
rows: IVectorProducer<never>
cols: IVectorProducer<never>
...
}
Or related to the above, if we wanted to support arbitrary dimensions:
interface IMultidimensionalArrayProducer<T> {
dimension: ReadonlyArray<IVectorProducer<never>>
...
}
Note that a matrix would need to allocate separate reader objects for each dimension, but for 2+ dimensional arrays, we probably don't care.
You could also combine:
// Convenience for the common case of 2D arrays
interface IMatrixProducer<T> extends IMultidimensionArrayProducer<T> {
// Fixed at 2 dimensions
dimension: [IVectorProducer<never>, IVectorProducer<never>];
// Aliases to dimensions[0|1]
rows: IVectorProducer<never>
cols: IVectorProducer<never>
}
On second through, passing subscriptions through to upstream produces might not be a great idea. In addition to the aforementioned need for refcounting, it also means that APIs like cellsChanged(..., producer) will receive the upstream producer.
Yeah. I'm not sure I have enough of a sense of all the scenarios to know what the best API would be. The bit where I'm stuck is the single consumer and multiple producers, where the consumer needs to know each producer. I wonder if this is an anti-pattern? Maybe we have a single multiplexing producer, or some pub-sub mechanism.
It tempts me to try to avoid the terms "row"/"col", but R0C0 terminology sure feels natural for insertRows, insertCols, etc.
I feel this pain. I think at some point matrix-producer will grow to be more table-like, so rows and cols feel natural---or it gets wrapped and we can have the outer object mediate between the two.
The notifications for row/col changes in a matrix are the same as those for a vector....
The two producers for row and cols is nice, and I think it would be worth special casing the 2x case if we did that. One consideration is whether we want to decouple the notifications for rows and cols. Do we need to worry about concurrent row/col insertions?
This would be an interesting way to support row/col annotations:
// Define a pair of vectors to give the matrix it's shape:
const employeeIDs = new VectorProducer<number>(1, 2, 3, 4, ...);
const employeeInfo = new VectorProducer<string>("last", "first", "department", "title", ...);
// Construct a matrix, passing the permutation vectors
const employees = new MatrixProducer(/* rows: */ employeeIDs, /* cols: */ employeeInfo);
I was playing w/the above idea, and found myself wanting distinct change notifications for the following vector ops:
Today, we express 'replace' by inserting/removing the same number of items, but that would have the effect of clearing the matrix row/cols.
(As mentioned above, we might also want a distinct move op for efficiency.)
Agreed. I started adding a mapRange
function to the tree, which is essentially a replace. For correctness I think we need something like this. I think having a move op is also probably necessary, but can be implemented as two replacements. The main thing would be ensuring that externally it's seen as an atomic op.
Might also want to consider a sort op too.
Exposing 'matrixProducer' on IMatrixReader makes it a bit awkward to use IMatrixReader as a generic interface for reading 2D data that is not backed by an IMatrixProducer (i.e., there is no subscription / change notification.)
(I've been debating with myself if the awkwardness is because the above is an abuse of IMatrixReader.)
Researching various API nits to see if I can find a compelling majority. (Feel free to comment/add).
Conventions
Intervals expressed as start/count?
Currently I use start/count because it's unambiguous wrt. if the 'end' is inclusive/exclusive.
Points, intervals, etc. are open-coded as numbers (as opposed to arrays, objects, etc.)
This was done out of fear of hidden array/object allocations, but never verified.
Vector, Matrix, etc. are distinct types.
Related to the above, if we were to pass coordinates/sizes as arrays, we could generalize vectors, matrices, etc. to n-dimensional arrays.
Naming
'col' as an abbreviation for column?
'numRows' as an abbreviation for 'number of rows'?
[number, number]
.