microsoft / tiny-calc

Tiny expression evaluator
MIT License
39 stars 9 forks source link

API design & conventions #58

Open DLehenbauer opened 4 years ago

DLehenbauer commented 4 years ago

Researching various API nits to see if I can find a compelling majority. (Feel free to comment/add).

Conventions

Intervals expressed as start/count?

Currently I use start/count because it's unambiguous wrt. if the 'end' is inclusive/exclusive.

insertRows(startRow: number, numRows: number);

Points, intervals, etc. are open-coded as numbers (as opposed to arrays, objects, etc.)

This was done out of fear of hidden array/object allocations, but never verified.

read(rowStart: number, colStart: number, rowCount: number, colCount: number);

Vector, Matrix, etc. are distinct types.

Related to the above, if we were to pass coordinates/sizes as arrays, we could generalize vectors, matrices, etc. to n-dimensional arrays.

Naming

'col' as an abbreviation for column?

insertCol(col: number, numCols: number);

'numRows' as an abbreviation for 'number of rows'?

jack-williams commented 4 years ago

Intervals expressed as start/count

I like this because you can have statically typed singletons (where count === 1). There may be some ergonomic issues we haven't come across yet, but so far this seems like a good convention.

'col' as an abbreviation for column

I'm happy to follow the other libs and not abbreviate. (Maybe we can find a linter to enforce this? I know that it is easy to let a few slip through otherwise!).

'numRows' as an abbreviation for 'number of rows'

I like rowCount, at least for domains that are fundamentally 2d.

DLehenbauer commented 4 years ago

Sigh. It looks like matrix.get([row, col]) is a ~15-20% perf hit for no good reason (where matrix is backed by a 1D array + trivial math.)

Guess we should continue open-coding coordinates as (row, col, rowCount, colCount).

DLehenbauer commented 4 years ago

I've been thinking about moving '.removeMatrixConsumer()' and similar to the IReader so that a typical transducer doesn't need to hold references to both the producer and the reader (which are often the same object in practice.)

One oddity is that the consumer still needs to pass a reference to itself, so that the producer knows which consumer to remove (again, because producer/reader are typically the same object.)

this.reader.removeMatrixConsumer(/* consumer */ this);

It's tempting to make removeMatrixConsumer an unbound function and infer the consumer instance via 'this', but probably not a good idea.

jack-williams commented 4 years ago

That makes sense; I guess it's really like a closeReader method?

It's tempting to make removeMatrixConsumer an unbound function and infer the consumer instance via 'this', but probably not a good idea.

I'm not quite sure what this means. Do you have an example?

DLehenbauer commented 4 years ago

I was thinking there might be a way to abuse the 'this' behavior with function () {} such that calling reader.close*() would implicitly pass the caller's this to producer.close()... realized it was probably a bad idea before I got around to figuring out if it could work.

DLehenbauer commented 4 years ago

We chose non-conflicting APIs so that the same object can implement IProducer, IVectorProducer, and IMatrixProducer.

We should do the same with IReader, IVectorReader, IMatrixReader, since it's common for a producer to implement reader and return itself on open().

Something like renaming read() -> get(), getItem(), 'getCell()`.

for matrix we've said we would use rowCount / colCount, which implies the number of dimensions... for maps vs. vectors I guess that would be entryCount and itemCount?

DLehenbauer commented 4 years ago

Our current Matrix/VectorConsumer interfaces currently communicate a move as a remove/insert, which might be suboptimal for a caching transducer. (e.g., reordering causes the transducer unnecessarily evict cached results.)

DLehenbauer commented 4 years ago

Originally, I*Reader was just an API-trick to ensure that anyone consuming values from a producer has subscribed for updates.

However, as we explore multiple views, I'm beginning to see the value of I*Readers as cursors that can cache subsets of the permutation/mapping.

This makes me think adding I*Writer probably makes sense, as you might want multiple caches when writing to distant regions of a vector, matrix, etc.

DLehenbauer commented 4 years ago

The above raises the question of how a single consumer acquires multiple cursors. They could call open() multiple times and the IProducer is responsible for coalescing duplicate subscriptions.

Another idea is that I*Reader() has a fork() or better named method that creates a new reader for the same producer.

DLehenbauer commented 4 years ago

The IVectorConsumer.itemsChanged() signature forces the producer to alloc an array with the newly inserted items. We shouldn't do that, as it forces a producer to eagerly load/calculate values without knowing if the consumer needs them.

DLehenbauer commented 4 years ago

We could allow simple transducers to pass subscriptions through to the producer(s) they are consuming, like this:

public openMatrix(consumer: IMatrixConsumer) {
    // subscribe on behalf of the consumer, but throw away the reader.
    this.source.openMatrix(consumer);

    return this;  // Provide our own IMatrixReader
}

This works when changes are conservatively 1:1 with the source.

It would require the following changes:

  1. Consumers would need to be refcounted to balance unsubscription / avoid duplicate notifications.
  2. We would need to remove all values from the change notifications.
DLehenbauer commented 4 years ago

(Continuing to use this thread to capture misc. api thoughts... eventually, I should summarize & justify various decisions in spec.md or similar.)

On second through, passing subscriptions through to upstream produces might not be a great idea. In addition to the aforementioned need for refcounting, it also means that APIs like cellsChanged(..., producer) will receive the upstream producer.

This is problematic if a consumer has subscribed to multiple produces of the same type and is using the 'producer' value to distinguish.

DLehenbauer commented 4 years ago

When combining R0C0 with a 2D geometry library, passing y/height before x/width feels unnatural.

cellsChanged(/* rowStart: */ rect.y, /* colStart */ rect.x, /* rowCount: */ rect.height, /* colCount: */ rect.width);

It tempts me to try to avoid the terms "row"/"col", but R0C0 terminology sure feels natural for insertRows, insertCols, etc.

DLehenbauer commented 4 years ago

The notifications for row/col changes in a matrix are the same as those for a vector.

It makes me want to do something like this:

interface IMatrixProducer<T> {
    rows: IVectorProducer<never>
    cols: IVectorProducer<never>
    ...
}

Or related to the above, if we wanted to support arbitrary dimensions:

interface IMultidimensionalArrayProducer<T> {
    dimension: ReadonlyArray<IVectorProducer<never>>
    ...
}

Note that a matrix would need to allocate separate reader objects for each dimension, but for 2+ dimensional arrays, we probably don't care.

You could also combine:

// Convenience for the common case of 2D arrays
interface IMatrixProducer<T> extends IMultidimensionArrayProducer<T> {
    // Fixed at 2 dimensions
    dimension: [IVectorProducer<never>, IVectorProducer<never>];

    // Aliases to dimensions[0|1]
    rows: IVectorProducer<never>
    cols: IVectorProducer<never>
}
jack-williams commented 4 years ago

On second through, passing subscriptions through to upstream produces might not be a great idea. In addition to the aforementioned need for refcounting, it also means that APIs like cellsChanged(..., producer) will receive the upstream producer.

Yeah. I'm not sure I have enough of a sense of all the scenarios to know what the best API would be. The bit where I'm stuck is the single consumer and multiple producers, where the consumer needs to know each producer. I wonder if this is an anti-pattern? Maybe we have a single multiplexing producer, or some pub-sub mechanism.

It tempts me to try to avoid the terms "row"/"col", but R0C0 terminology sure feels natural for insertRows, insertCols, etc.

I feel this pain. I think at some point matrix-producer will grow to be more table-like, so rows and cols feel natural---or it gets wrapped and we can have the outer object mediate between the two.

The notifications for row/col changes in a matrix are the same as those for a vector....

The two producers for row and cols is nice, and I think it would be worth special casing the 2x case if we did that. One consideration is whether we want to decouple the notifications for rows and cols. Do we need to worry about concurrent row/col insertions?

DLehenbauer commented 4 years ago

This would be an interesting way to support row/col annotations:

// Define a pair of vectors to give the matrix it's shape:
const employeeIDs = new VectorProducer<number>(1, 2, 3, 4, ...);
const employeeInfo = new VectorProducer<string>("last", "first", "department", "title", ...);

// Construct a matrix, passing the permutation vectors
const employees = new MatrixProducer(/* rows: */ employeeIDs, /* cols: */ employeeInfo);
DLehenbauer commented 4 years ago

I was playing w/the above idea, and found myself wanting distinct change notifications for the following vector ops:

  1. Inserting/Removing items (which would insert/remove rows/cols in the matrix)
  2. Replacing the values of existing items (which has no effect on the matrix)

Today, we express 'replace' by inserting/removing the same number of items, but that would have the effect of clearing the matrix row/cols.

(As mentioned above, we might also want a distinct move op for efficiency.)

jack-williams commented 4 years ago

Agreed. I started adding a mapRange function to the tree, which is essentially a replace. For correctness I think we need something like this. I think having a move op is also probably necessary, but can be implemented as two replacements. The main thing would be ensuring that externally it's seen as an atomic op.

Might also want to consider a sort op too.

DLehenbauer commented 4 years ago

Exposing 'matrixProducer' on IMatrixReader makes it a bit awkward to use IMatrixReader as a generic interface for reading 2D data that is not backed by an IMatrixProducer (i.e., there is no subscription / change notification.)

(I've been debating with myself if the awkwardness is because the above is an abuse of IMatrixReader.)