Closed tdunning closed 5 years ago
SparseMatrix
is designed for linear algebra through highly efficient built-in methods. It is immutable and non-iteratable by design. Use SparseDataset
for these methods.
OK. I can see why you might want immutable. That is reasonably easy to live with.
But I need the fast linear algebra (see the use of ata() for computing cooccurrence). But the linear algebra that SparseMatrix provides isn't sufficient. Why not allow limited slicing and iteration so that I can implement my own linear algebra?
Iteration will generate a lot of small objects. It will be 10-100X slower compared to our built-in functions.
But the functions I need aren't built-in.
Column sums. Column median. Top 100 from column. Elementwise operations on non-zeros. None are there.
The list is endless and can't reasonably be entirely built in. But iteration can be built in.
This helps the use of the library because the error-prone details of exactly how to do the iteration can be separated from the unknown-in-advance details of the computation.
Furthermore, your assertion about performance is based on your suspicions rather than data. I just tested this assertion on a SparseMatrix with a half million nonzeros. The difference between direct iteration and lambda iteration is minimal. It was 510 micro-seconds direct and 525 micro-seconds using a lambda. Even using the streams interface, the speed was 1.2 milliseconds. This is only about 2.5:1 penalty. Not a great thing, but the flexibility could be worth it some days. The fact that computing something like column sums reduces to a one-liner is mighty fine when you just need to pound out some code.
double t0 = System.nanoTime() / 1e9;
double[] sum0 = new double[2000];
for (int rep = 0; rep < 1000; rep++) {
for (int column = 0; column < m.ncols(); column++) {
for (int k = m.colIndex[column]; k < m.colIndex[column + 1]; k++) {
sum0[column] += m.x[k];
}
}
}
double t1 = System.nanoTime() / 1e9;
double sum = 0;
for (double v : sum0) {
sum += v;
}
System.out.printf("%.3f (%.2f)\n", (t1 - t0), sum);
t0 = System.nanoTime() / 1e9;
double[] sum1 = new double[2000];
for (int rep = 0; rep < 1000; rep++) {
m.foreachNonzero(
(i, j, x) -> sum1[j] += x
);
}
t1 = System.nanoTime() / 1e9;
sum = 0;
for (double v : sum1) {
sum += v;
}
System.out.printf("%.3f (%.2f)\n", (t1 - t0), sum);
t0 = System.nanoTime() / 1e9;
double[] sum2 = new double[2000];
for (int rep = 0; rep < 1000; rep++) {
m.nonzeros()
.forEach(entry -> sum2[entry.j] += entry.x);
}
t1 = System.nanoTime() / 1e9;
sum = 0;
for (double v : sum2) {
sum += v;
}
System.out.printf("%.3f (%.2f)\n", (t1 - t0), sum);
Use SparseDataset
, which internal data structure is more suitable for iteration.
But I need the linear algebra functions like .ata().
Look, I am happy to write this code. But it would be nice to get a real critique.
Why are you so resistant to improving SparseMatrix?
Here are more careful benchmark results using jmh:
Benchmark Mode Cnt Score Error Units
IteratorSpeed.timeDirect avgt 5 429888.246 ± 3819.232 ns/op
IteratorSpeed.timeDirect:·gc.alloc.rate avgt 5 ≈ 10⁻⁴ MB/sec
IteratorSpeed.timeDirect:·gc.alloc.rate.norm avgt 5 0.088 ± 0.001 B/op
IteratorSpeed.timeDirect:·gc.count avgt 5 ≈ 0 counts
IteratorSpeed.timeDirect:·stack avgt NaN ---
IteratorSpeed.timeIterator avgt 5 430718.537 ± 7831.509 ns/op
IteratorSpeed.timeIterator:·gc.alloc.rate avgt 5 0.028 ± 0.001 MB/sec
IteratorSpeed.timeIterator:·gc.alloc.rate.norm avgt 5 16.089 ± 0.011 B/op
IteratorSpeed.timeIterator:·gc.count avgt 5 ≈ 0 counts
IteratorSpeed.timeIterator:·stack avgt NaN ---
IteratorSpeed.timeStream avgt 5 1032370.658 ± 55295.704 ns/op
IteratorSpeed.timeStream:·gc.alloc.rate avgt 5 0.077 ± 0.004 MB/sec
IteratorSpeed.timeStream:·gc.alloc.rate.norm avgt 5 104.210 ± 0.011 B/op
IteratorSpeed.timeStream:·gc.count avgt 5 ≈ 0 counts
IteratorSpeed.timeStream:·stack avgt NaN ---
This demonstrates that the JVM is optimizing away the structure allocation in the streams case. The stream iterator is definitely slower, but the API advantages are worth it for some cases.
Also, the SparseDataset
doesn't allow column-wise access either so it isn't usable for the code I am trying to write.
The SparseMatrix
is very useful, but needs reasonable iteration.
SparseDataset.toMatrix
returns SparseMatrix
.
I would like to ask how the following tasks should be done to a SparseMatrix by an external program:
normalize columns of a SparseMatrix by the diagonal element, if present, or by the L_n norm of each column
normalize columns by an external vector
normalize rows of a SparseMatrix according to a specialized norm such as the max positive element of a row
retain only the top k elements of each column of a SparseMatrix
implement an optimization that depends on errors only defined on non-zero elements of a SparseMatrix. Alternating least squares is a good example of this.
As it stands, there are no efficient implementations of any of these linear algebra operations possible for a program using Smile. All of these operations are important to implement efficiently in my work.
I would like some clarity about whether you will continue to block improvements that will allow high performance improvements to Smile that would support these needs. If you just don't plan to accept any such contributions, please just say so and I will stop trying to use Smile for this work.
SparseDataset
is row-wise and you want column-wise operations, you can construct a transposed version and do operations on it. Even including transpose back-and-forth, it is still very fast.SparseMatrix
is designed for it rather than being a general-purpose matrix library. You have not make any pull requests of tested and high-performance code. Why do you call "I block high performance improvements"?Opened https://github.com/haifengl/smile/pull/447 with changes and tests
I have a separate repo with jmh benchmarks that isn't public yet. These tests should be sufficient.
Why not implement the iterator and test the performance on large matrices by yourself?
Done. Long ago. That is how I posted the benchmarks that you didn't comment on.
If you have concerns about that SparseDataset is row-wise and you want column-wise operations, you can construct a transposed version and do operations on it. Even including transpose back-and-forth, it is still very fast.
But it doesn't do linear algebra.
Nobody blocks a proposal. I make design choices based on my knowledge of internal data structures and algorithms and years of experience.
And I have made a proposed changed based on my years of experience and a very light change to a data structure. Your comments have been uniformly negative, have ignored evidence, made arguments unsupported by data and then changed your arguments over time.
Smile is a machine learning library and SparseMatrix is designed for it rather than being a general-purpose matrix library.
You know, when we started, you said you didn't want any changes in SparseMatrix because it was for linear algebra. I pointed out that I wanted to do linear algebra and gave some examples and now you say that it isn't linear algebra. This is getting silly.
You have not make any pull requests of tested and high-performance code. Why do you call "I block high performance improvements"?
Added a pull request.
Hoping to hear something more positive and serious from you.
Expected behaviour
I have a need to be able to iterate over nonzero elements of a sparse matrix, but retaining information about which element I am seeing. Ideally, I would also like enough information so that I can do fast in-place updates of elements in the matrix. Also ideally, I would like a very flexible API so that I can do things like filter which elements I operate on and accumulate results in flexible ways.
Related to this is a need to accumulate data in row,column,value format preparatory to reformatting the data as an actual sparse matrix.
Examples of where this is useful include:
scanning a column to find the 100 biggest elements
evaluating column-wise functions such as scaled sums
performing complex element-wise operations, possibly with reference to other data structures beyond just the element in-hand and with or without the result having different sparsity structure from the original data.
accumulating count data into a sparse matrix
Actual behaviour
I propose (and have implemented prototypes of) one new class,
CooData
and four new methods on SparseMatrix:The
nonzero
methods return a stream so that I can use the full flexibility of the Stream idiom for operations such as collecting values into a Collection. TheforeachNonZero
methods can be faster because they involve less boxing into aMatrixEntry
structure, but are considerably less flexible. I have found both important for work in natural language processing, particularly term cooccurrence analysis.Code snippet
Here is some code that transforms a stream of documents, represented as
String
into a sparse matrix containing the per-document word counts.In the code above, multiple updates to the same document x word element will be accumulated together. For the purposes of document-level cooccurrence, however, we just want a binary indicator of presence. The elements of this count matrix can be clipped to be either zero or one with this small snippet:
And here is some sample code that does a complex scoring algorithm on term cooccurrence data using external counts:
These structures and methods allow very simple code for implementing complex tasks typical of applications using SparseMatrix operations to represent language.
It is also important to note that these methods allow code to be easily and safely parallelized for higher performance with almost no code changes.
Information
This code only requires Java 8 or above. This is already a requirement for SMILE due to the use of default methods in interfaces.
This code also has only very weak dependence on the version of Smile.
I have built this code on OSX, but will be testing on Ubuntu as well. I expect to see no portability issues since this is standard Java.