Open HowardLander opened 8 years ago
As a workaround, you can traverse the matrix manually and write it into the file.
Thanks for the response.
I suppose I could do that, but then wouldn't I lose the "sparsity" of it. Well, I guess I could encode a value (say -1) for the "empty cells", but that seems a bit ugly. How difficult would it be for these classes to be serializable? Do you think the default writeObject() and readObject() methods would work?
Howard
Why losing sparsity? You can convert sparse matrices toMatrixMarket
(which is super compact) and write it into a file. As far as I understand the problem is when you convert dense matrices.
Also feel free to send a PR with the solution - I'm happy to merge that in.
BTW, you're trying to allocate 2gb matrix on a 800mb heap.
Are you sure? Here's my command line:
/usr/java/jdk1.7.0_51/bin/java -Xmx8000m -cp .:./la4j-0.6.0.jar MatrixTestBug
Howard
On 9/28/16 4:22 PM, Vladimir Kostyukov wrote:
BTW, you're trying to allocate 2gb matrix on a 800mb heap.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vkostyukov/la4j/issues/282#issuecomment-250288240, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNGrbx1wN0jY0xprA0ckhAtPxXXsbvsks5qusyigaJpZM4KJPlZ.
Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
Ahh I see.
Actually, this looks like an integer overflow. Your cardinality is already pretty close to Int.MaxValue (15000 * 15000) and when we're trying to allocatate the output array we multiply that by 8 (as per a single double value) and get the negative result.
One solution to this would be to use ByteBuffer
instead of byte[]
so we can concat them (and even possibly do off-heap).
Hi Vladimir
I'm was unaware of Matrix Market. It looks interesting. I'll definitely check it out. Thanks for mentioning it.
As far as the dense matrix goes, that was just my way of testing. We have some matrices that, even sparse, are bigger than max int bytes, so I was actually trying to break my code when I broke yours!
I was definitely considering trying to make these classes serializable myself, but I didn't know if there was some reason you hadn't done this. If you think it's a reasonable idea, I'd be happy to give it a try.
FYI: here's the code I was testing. I was wondering what would happen if the binary matrix was larger than max int.
try {
FileOutputStream fos = new FileOutputStream("test.tmp");
byte [] theMatrixBytes = theMatrix.toBinary();
System.out.println("matrix size: " + theMatrixBytes.length);
OutputStream theStream = new BufferedOutputStream(fos);
ObjectOutputStream oos = new ObjectOutputStream(theStream);
oos.writeInt(theMatrixBytes.length);
oos.write(theMatrixBytes);
oos.writeObject(nameSpace);
oos.writeObject(theList);
oos.close();
} catch (Exception e) {
e.printStackTrace();
}
Howard
On 9/28/16 4:21 PM, Vladimir Kostyukov wrote:
Why losing sparsity? You can convert sparse matrices |toMatrixMarket| (which is super compact) and write it into a file. As far as I understand the problem is when you convert dense matrices.
Also feel free to send a PR with the solution - I'm happy to merge that in.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vkostyukov/la4j/issues/282#issuecomment-250287795, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNGrXUJLmptYOJXbITZm_XyOQ51pRD7ks5qusw3gaJpZM4KJPlZ.
Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
I think if you swap to a dense (from CRS) you should be fine in that test. It's cheaper to write 15000x15000 dense matrix than to write 15000x15000 sparse matrix (with .setAll
) since the later involves an overhead for maintaining sparcity.
See below.
On 9/28/16 4:30 PM, Vladimir Kostyukov wrote:
Ahh I see.
Actually, this looks like an integer overflow. Your cardinality is already pretty close to Int.MaxValue (15000 * 15000) and when we're trying to allocatate the output array https://github.com/vkostyukov/la4j/blob/master/src/main/java/org/la4j/matrix/sparse/CRSMatrix.java#L1021 we multiply that by 8 (as per a single double value) and get the negative result.
One solution to this would be to use |ByteBuffer| instead of |byte[]| so we can concat them (and even possibly do off-heap).
Would we have to change the toBinary() code, or is there something I can change in mine?
Howard
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vkostyukov/la4j/issues/282#issuecomment-250290334, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNGrYyogahgjNaKgMSO9pr7oMV0-B7iks5qus5_gaJpZM4KJPlZ.
Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
Matrices/Vectors are used to be serializeable. I decided to not do that by the same reason no one else should do that. Java's serialization is completely broken (both performance-wise and security-wise). You should look at Jackson instead.
I'm afraid not, there is no easy fix to that.
I don't doubt what you are saying. What happens when I have a sparse matrix 1 million by 1 million? What I I really need a way to write large CRS matrices of unknown sparsity to file along with some other relevant information.
Are the security issues you mentioned in your other message the reason that the writeExternal and readExternal methods were removed? I'm not a security expert at all.
Howard
On 9/28/16 4:33 PM, Vladimir Kostyukov wrote:
I think if you swap to a dense (from CRS) you should be fine in that test. It's cheaper to write 15000x15000 dense matrix than to write 15000x15000 sparse matrix (with |.setAll|) since the later involves an overhead for maintaining sparcity.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vkostyukov/la4j/issues/282#issuecomment-250290960, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNGrdejjv6xL8uO6F0QlQ6DRbz1C-7Xks5qus8RgaJpZM4KJPlZ.
Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
Potentially uninformed question: Is Jackson better than Gson? I've used Gson.
On 9/28/16 4:35 PM, Vladimir Kostyukov wrote:
Matrices/Vectors are used to be serializeable. I decided to not do that by the same reason no one else should do that. Java's serialization is completely broken (both performance-wise and security-wise). You should look at Jackson instead.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vkostyukov/la4j/issues/282#issuecomment-250291515, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNGrVPFBwmgZ2OQ-bMwGBq4QQRab49aks5qus-YgaJpZM4KJPlZ.
Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
Thre is a limit on a max sparse matrix size in la4j. It's MxN=Int.MaxValue
. There is no way you can do 1m x 1m with la4j, just because there is a limit on array size on JVM (the maximum array you can allocate is Int.MaxValue, 2gb). If you need support for bigger matrices, I'd recommend either:
Looking at this comparison table (which I'm quite sceptical about): https://java-matrix.org/ you can try MTJ and Parallel Colt and see if their limits are true for sparse matrices (16gb and 64gb correspondingly).
Thanks for the info. I probably should have said by now that I really like la4j. It's easy to work with and has all of the features (except for the serialization issues we have been discussing) that I need. I don't know what motivates folks like you to provide these sorts of libraries but I'm sure glad you do!
The largest matrix that we have currently is about 18600 by 18600, but we do have plans for bigger. Square root of max int is about 46000, so I guess you are right that we will probably have to look at something else fairly soon. I had looked at colt before and there was something I didn't like about it, but that was several years ago.
I may just live with the limitation for now. I do have a workaround
Thanks
Howard
On 9/28/16 4:52 PM, Vladimir Kostyukov wrote:
Thre is a limit on a max sparse matrix size in la4j. It's |MxN=Int.MaxValue|. There is no way you can do 1m x 1m with la4j, just because there is a limit on array size on JVM (the maximum array you can allocate is Int.MaxValue, 2gb). If you need support for bigger matrices, I'd recommend either:
- Build you own thing with a sub-set of features
- Use something else (MTJ and Parallel Colt support CRS/CCS)
Looking at this comparison table (which I'm quite sceptical about): https://java-matrix.org/ you can try MTJ and Parallel Colt and see if their limits are true for sparse matrices (16gb and 64gb correspondingly).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vkostyukov/la4j/issues/282#issuecomment-250296047, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNGrcoTK9EBYmc_rJlDTad8rdkWGz2Bks5qutOIgaJpZM4KJPlZ.
Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
Yeah, you should be fine with 18600x18600 matrices. I admit there is a bug in .toBinary
(w.r.t to integer overflow) and if it's possible to allocate this matrix with la4j it should be possible to convert that into a binary form. Although, I don't have a good solution yet for this and you should feel free to suggest some.
In the meantime, I'd advice to shove your matrices into Jackson and see if it works (it should!). You don't need Java's standard serialization for that. Jackson (If you're ok with JSON) or Kryo (for binary format). These libraries will do the trick much better (and faster and safer).
The following code fails test class fails on a fairly small (but not sparse) matrix:
Here's the invocation:
/usr/java/jdk1.7.0_51/bin/java -Xmx8000m -cp .:./la4j-0.6.0.jar MatrixTestBug java.lang.IllegalArgumentException at java.nio.ByteBuffer.allocate(ByteBuffer.java:330) at org.la4j.matrix.sparse.CRSMatrix.toBinary(CRSMatrix.java:1007) at MatrixTestBug.main(MatrixTestBug.java:13)
and here's the code (MatrixTestBug.java)
import java.io.; import org.la4j.;
public class MatrixTestBug {
}
One possible work around (at least for me) for this would be to restore the writeExternal and readExternal methods to the matrix classes. I suspect I could also ignore this problem if the matrix classes were serializable, as the only reason I am doing this is to write the matrix (and some other data) to a file. Any ideas?
Thanks Howard