Closed GoogleCodeExporter closed 9 years ago
Thanks for reporting this issue! I hadn't really noticed that problem before,
but yes, you are right, this is a problem with C++ functions.
Actually, the same thing can happen with the *Buffer fields. I'm not sure how
to handle this efficiently. Any bright ideas?
Original comment by samuel.a...@gmail.com
on 26 May 2013 at 11:34
I would remove all the fields, because they can't be updated reliably. The
getXxxBuffer() methods should always return fresh Buffer instance, with freshly
calculated capacity. We could encourage users to reuse *Buffer instance. I
would deprecate fullSize() and reset() methods.
By the way, CvMat.size() calculation is also incorrect in case of subMat. It
should be:
public int size() {
int rows = rows();
return cols()*elemSize()*channels() + (rows > 1 ? step()*(rows-1) : 0);
}
For example consider:
CvMat mat = CvMat.create(10, 10, CV_8U);
CvMat matSub = CvMat.createHeader(5, 5, CV_8U);
cvGetSubRect(mat, matSub, cvRect(5, 5, 5, 5));
The size of data of mat is 100 bytes, step is 10 (let's ignore alignment).
matSub's data point to mat.data + 55, that is the same data, offset by upper
left corner of the rectangle. Current version of size() method will return 50
for matSub, which will go beyond the mat.data length, because 50+55 > 100. This
problem is minor, as it does not affect correctly implemented programs, it only
allows buffer overflow errors, that are otherwise prevented.
Original comment by viliam.d...@gmail.com
on 28 May 2013 at 4:52
Creating a Buffer is pretty expensive, and isn't that great for the garbage
collector either, so creating one for each call to CvMat.put() isn't an option.
That's why I'm doing this caching in the first place. It wasn't a problem with
the simpler (IMHO better) C API... Any thoughts?
Thanks for the the size() fix! I'll put that in :)
Original comment by samuel.a...@gmail.com
on 2 Jun 2013 at 1:04
I would deprecate all get() and put() methods, as they are slow and might get
incorrect without reset().
When I started to use JavaCV, I began to use CvMat.get and .put methods, but
soon stopped to, as it is much slower when compared to using the Buffer
directly. JNI calls are expensive too, and the put(int i, double v) makes one
call and put(int i, int j, double v) makes three more. Next, there is
unnecessary conversion from double and back, which is not free (maybe it is
optimized out by JIT, I did not test).
Compare these snippets. They all do one get, multiply and put per pixel. The
first uses the (i, j) version, second uses the (i) version, the third uses the
Buffer directly and the fourth copies the buffer to Java array and back:
public static void testIJ() {
long start = System.nanoTime();
CvMat mat = CvMat.create(1000, 1000, CV_8U);
for (int i=0; i<1000; i++)
for (int j=0; j<1000; j++) {
byte val = (byte) mat.get(i, j);
val = (byte) (2*val);
mat.put(i, j, val);
}
System.out.println((System.nanoTime() - start) / 1000000 + " ms");
}
public static void testI() {
long start = System.nanoTime();
CvMat mat = CvMat.create(1000, 1000, CV_8U);
int step = mat.step() / mat.elemSize();
int channels = mat.nChannels();
for (int i=0; i<1000; i++)
for (int j=0; j<1000; j++) {
byte val = (byte) mat.get(step + j*channels);
val = (byte) (2*val);
mat.put(step + j*channels, val);
}
System.out.println((System.nanoTime() - start) / 1000000 + " ms");
}
public static void testDirect() {
long start = System.nanoTime();
CvMat mat = CvMat.create(1000, 1000, CV_8U);
int step = mat.step() / mat.elemSize();
int channels = mat.nChannels();
ByteBuffer buffer = mat.getByteBuffer();
for (int i=0; i<1000; i++)
for (int j=0; j<1000; j++) {
byte val = buffer.get(step + j*channels);
val = (byte) (2*val);
buffer.put(step + j*channels, val);
}
System.out.println((System.nanoTime() - start) / 1000 + " us");
}
public static void testJavaArray() {
long start = System.nanoTime();
CvMat mat = CvMat.create(1000, 1000, CV_8U);
int step = mat.step() / mat.elemSize();
int channels = mat.nChannels();
ByteBuffer buffer = mat.getByteBuffer();
byte[] jbuffer = new byte[mat.size()];
// I assume the buffer is continuous, as the mat.isContinuous() always
// returns false
buffer.get(jbuffer);
for (int i=0; i<1000; i++)
for (int j=0; j<1000; j++) {
byte val = jbuffer[step + j*channels];
val = (byte) (2*val);
jbuffer[step + j*channels] = val;
}
buffer.position(0);
buffer.put(jbuffer);
System.out.println((System.nanoTime() - start) / 1000 + " us");
}
Resulting times (after 10 iterations):
TEST HOTSPOT VM 1.6, 3GHZ ATHLON X2 1GHZ 1CORE ANDROID 2.3.5 (no JIT)
testIJ 280ms 9800ms
testI 65ms 4500ms
testDirect 3.5ms 1450ms
testJavaArray 5.1ms 60ms
Surprising is the Android result, which I added for information. Probably due
to the lack of JIT, even Buffer access is very slow. On desktop VM the
JavaArray version is a bit slower than via the Buffer, due to unnecessary
copying of the buffer data.
If you want to simplify pixel access for the users, you could add interface say
"PixelAccessor", which will do the expensive operations of creating Buffer and
calculating size upon initialization and users will be asked to reuse the
instance. I attach BytePixelAccessor as an example, similar class should be
created for other types. You can add createBytePixelAccessor() method to CvMat
and IplImage (not getBytePixelAccessor(), so the name implies something is
created upon every invocation).
Viliam
Original comment by viliam.d...@gmail.com
on 3 Jun 2013 at 5:30
Attachments:
They are not slow with floating-point matrices (native function call isn't slow
BTW), but I agree they are not ideal, and shouldn't be used for integer
matrices...
Anyway, thanks for all the recommendations! I'm not sure how this is going to
turn out. The thing is OpenCV itself switched to "cv::Mat", and I'm currently
working on something to help me wrap this kind of thing more easily (not just
for OpenCV), without having to create all the .java files by hand... When and
if I succeed, then I would start looking into better ways of accessing these
structures in Java, i.e.: your accessor class.
But it might take much time before I come up with anything :(
What do you think about all this?
Original comment by samuel.a...@gmail.com
on 8 Jun 2013 at 1:15
I don't understand your statement that floating-point matrices aren't slow. I
did a test with 32F matrices, and here are the results:
TEST HOTSPOT VM 1.6, 3GHZ ATHLON X2 1GHZ 1CORE ANDROID 2.3.5 (no JIT)
testIJ_8u 231ms 9715ms
testI_8u 59ms 4467ms
testDirect_8u 3.4ms 1460ms
testJavaArray_8u 13.0ms 120ms
testIJ_32f 233ms 10267ms
testI_32f 65ms 5050ms
testDirect_32f 2.8ms 2199ms
testJavaArray_32f 13.0ms 1143ms
I did also test 64f matrices, and the times were roughly the same as float. The
testJavaArray_32f is much slower than testJavaArray_8u on android, because
probably some memory limit was approached (I saw grow heap operation after
every test run during 32f, but not during 8u). Each test was run 10times, the
result was taken by averaging the last five runs.
There *is* JNI call overhead. If you call, say, five operations on Mat, then
the overhead is negligible, but if you call CvMat.get(i, j) 5-million times,
which itself makes 5 JNI calls, then it is not. I first noticed this issue,
when processing the HoughLines result. I iterated the returned collection of
lines many times, and just by copying the lines to java array instead of
calling CvPoint.x() and .y() it run noticeably faster.
I can contribute the PixelAccessor enhancement as a patch, if you wish.
Original comment by viliam.d...@gmail.com
on 10 Jun 2013 at 5:03
FYI, here is the benchmark: http://pastebin.com/DtGBkzyk
Original comment by viliam.d...@gmail.com
on 10 Jun 2013 at 5:06
Well, slow is relative, not as slow as creating a Buffer object :) But of
course it's still not optimal. And yes Android/Dalvik does suck however one
looks at it.
BTW, your "accessor" solution reminds me of this article:
Java Programming for High-Performance Numerical Computing
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.1554
Actually, I feel this is something that should be included in JavaCPP. The way
I see it we'd have a collection of classes, e.g.: ByteAccessor, ShortAccessor,
etc. with constructors that accept a Buffer and the dimensions, and a
collection of appropriate get() and put() methods for 2D and 3D access.
Then, we could include factory methods in the corresponding Pointer classes, as
well as others like CvMat, CvPoint, etc.
What do you think? Is this something you'd like to contribute? Sounds really
useful in any case :)
And to cope with Android, instead of a Buffer, the accessors could also accept
a Pointer, slurping and dumping native memory on user request only.
Original comment by samuel.a...@gmail.com
on 13 Jun 2013 at 7:24
I'd like to contribute. Lets summarize what has to be done:
- deprecate getXxxBuffer and fullSize methods on CvMat
- add XxxPixelAccessor classes
- add factory methods for PixelAccessors to CvMat and IplImage
- add CopyingPixelAccessor for Android (and others without JIT)
We should consider the PixelAccessor classes to be convenience classes for
developer. The direct Buffer access is still fastest, even only slightly. This
is about avoiding the complexity for the developer and avoid possible error
when calculating the indexes. There is no point in adding it to the Pointer
class, as it has no internal structure.
Regarding the CvPoint class, there are bulk put and get methods to and from
java (get and put) which could be rewritten using buffers too:
public final CvPoint put(int[] pts, int offset, int length) {
asByteBuffer().asIntBuffer().put(javapoints, offset, length);
}
public CvPoint get(int[] pts, int offset, int length) {
points.asByteBuffer().asIntBuffer().get(javapoints, offset, length);
}
These are approximately 10 times faster than the current versions (tested on
array of 10000 points on desktop java). Also, PointAccessor could be written,
that could use Buffer access. It is about 30 times faster than using x() and
y() methods. See the benchmark: http://pastebin.com/bUC6u5R7
Actually, such class could be written for any c++ class, and, as you see, it
strongly depends on the internal structure of the class. These files should be
generated...
I read a JNI performance article, and one of the recommendations was to keep
the number of needed JNI calls low, because there is overhead. As from my
experience, such algorithms should be written in C(++), and call them from JNI.
Java should be used to call complex operations only (for example operations on
Mats), and not to do pixel processing. Or for prototyping.
By the way, we are rewriting our entire algorithm to C++, as we use lot of
pixel processing and point processing, and on android it will be a lot faster.
Further, we plan iOS version too, so the code can be reused.
Original comment by viliam.d...@gmail.com
on 14 Jun 2013 at 5:55
I understand there are a lot of little details to take care of, but first
things first: We need those "accessor" classes. Now, we could have a
BytePixelAccessor, a BytePointerAccessor, etc. but I feel that having one
ByteAccessor class usable on *any* structure would be more useful. Do you not
feel this would be better?
Original comment by samuel.a...@gmail.com
on 14 Jun 2013 at 1:22
Let me try to explain this a bit better. Assume we have some `CvMat mat` and
`CvPoint pts`, then I think it would be nice to be able to do something like
this:
IntAccessor matAccessor = mat.getIntAccessor(); // or create ...
IntAccessor ptsAccessor = pts.getIntAccessor();
matAccessor.put(1, 1, ptsAccessor.get(1, 1));
Of course we'd have FloatAccessor and what not, and maybe the names could be
different too, but what I want to emphasize is that the reuse of the common
class, used by both CvMat and CvPoint in this case. Do you not feel this would
be a good thing?
Original comment by samuel.a...@gmail.com
on 16 Jun 2013 at 1:51
The accessors are about structure. CvMat.data do have structure, CvPoint does
too, but Pointer does not. They enable to write
matAccessor.get(i, j)
instead of
mat.getByteBuffer().get(i*mat.step() + j*mat.channels());
PixelAccessor saves the JNI calls and buffer creation (lets assume
getByteBuffer does not return cached instance). PixelAccessor offers no more
functionality than the current get() methods on CvMat, only it's lifetime is
separate from that of CvMat.
Try to write Accessor for Pointer. I think there would be no more functionality
than in java.nio.XxxBuffer, so I think is needless.
Original comment by viliam.d...@gmail.com
on 17 Jun 2013 at 4:03
"Multidimensional arrays" in C/C++, the ones accessed like this `mat[i][j]`,
are actually allocated in one contiguous area of memory: Check it up! Java's
Buffer provides no way to access those. I think it would be very useful to
provide an interface for native multidimensional arrays. Why do you feel
differently?
Original comment by samuel.a...@gmail.com
on 18 Jun 2013 at 12:30
Mat is not multidimensional array, neither is Pointer or Mat.data. Actually, i
have implemented the accessors already, but have not done any tests. I will
send patch then, and then try to implement Accessor for Pointer yourself,
you'll see there's no point in it.
Original comment by viliam.d...@gmail.com
on 18 Jun 2013 at 1:06
Sure, please explain how you would access the pixel at (100, 100) of a
dc1394video_frame_t.image :
http://code.google.com/p/javacv/source/browse/src/main/java/com/googlecode/javac
v/cpp/dc1394.java#834
Original comment by samuel.a...@gmail.com
on 19 Jun 2013 at 2:00
I don't know, it don't know it's structure. If it's simple multidimensional
array, then it's image.capacity(rows*cols).asByteBuffer().get(100*cols+100). I
would cache the buffer for access for fast access to other pixels.
Original comment by viliam.d...@gmail.com
on 19 Jun 2013 at 3:57
Here is the patch. Have a look at it and let me know.
Original comment by viliam.d...@gmail.com
on 19 Jun 2013 at 4:32
Attachments:
I deprecated the getXxxBuffer() methods in CvMat, but found out that I still
need them to create android bitmap:
bitmap.copyPixelsFromBuffer(cvImage.getByteBuffer());
For backwards compatibility, I would still deprecate getXxxBuffer methods and
create new createXxxBuffer() methods, that will always return new buffer. What
do you think?
Original comment by viliam.d...@gmail.com
on 19 Jun 2013 at 9:49
Let's go back to comment #16 first. Wouldn't it be nice instead to be able to
do something like this:
ByteAccessor a = new ByteAccessor(image, rows, cols);
byte b = a.get(100, 100);
?
Original comment by samuel.a...@gmail.com
on 19 Jun 2013 at 12:11
Sorry for not responding, I'm quite busy now...
Now I got it: the CvMat's pixelaccessor could be generalized to
multidimensional array accessor. But it not only has to know each dimension's
length, but the step too.
Now, I would put these to JavaCPP project, not to JavaCV. In JavaCV, there
could be factory methods in CvMat to simplify it's creation. What do you think?
Original comment by viliam.d...@gmail.com
on 26 Jun 2013 at 8:02
Yes!!
We don't need to name it "step", even though for images obviously that's what
is going to be used for the "width". We don't need to know the actual width of
an image to access one pixel in memory. And to generalize more easily, we could
keep the width, height, step or whatever in one "int[] dimensions" array, and
it would even possible to have get/put methods with varargs, but I'm not sure
how that would play with specialized 2D and 3D methods... ?
Anyway, as names I think ByteAccessor, IntAccessor, etc. sound fine, but what
do you think?
When you have something post it here:
http://code.google.com/p/javacpp/issues/
Thanks!
Original comment by samuel.a...@gmail.com
on 27 Jun 2013 at 12:47
Bump! It would be really nice to have those classes in JavaCPP. Please let me
know if you are still interested in working on this, thanks!
Original comment by samuel.a...@gmail.com
on 22 Sep 2013 at 12:46
Yes, I definitely plan to do it, but we postponed our CV project due to other
tasks, but if nothing changes, I will get back in two-three weeks.
Original comment by viliam.d...@gmail.com
on 25 Sep 2013 at 11:08
Hello,
I started to write the API. After writing it, I realized, that the code without
this API is so simple that I hesitate whether new API is really needed.
Example: Let's say we write a function to calculate sum of all pixels of
1-channel image. Using the new pixel accessor API it would be:
public long imgSum(CvMat mat) {
ByteArrayAccessor matacc = mat.createBytePixelAccessor(false);
// cache to avoid repetitive JNI call
int cols = mat.cols();
int rows = mat.rows();
long sum = 0;
for (int i=0; i<rows; i++)
for (int j=0; j<cols; j++)
sum += matacc.get(i, j) & 0xff; // 0xff to convert to unsigned
return sum;
}
By directly accessing the Buffer, the code would be:
public long imgSum(CvMat mat) {
ByteBuffer buf = mat.createDataByteBuffer();
int step = mat.step();
int rows = mat.rows();
int cols = mat.cols();
long sum = 0;
for (int i=0; i<rows; i++)
for (int j=0; j<cols; j++)
sum += buf.get(i*step + j) & 0xff;
return sum;
}
The only saving is the formula i*step + j, that is hidden. Is this worth a new
API?
Smarter programmer could even write it little faster like this:
public long imgSum(CvMat mat) {
ByteBuffer buf = mat.createDataByteBuffer();
int step = mat.step(), rows = mat.rows(), cols = mat.cols();
long sum = 0;
for (int i=0; i<rows; i++) {
int pos = i*step;
for (int j=0; j<cols; j++, pos++)
sum += buf.get(pos) & 0xff;
}
return sum;
}
This avoids i*step multiplication at every pixel, the trick I've learned by
reading OpenCV's code... This trick is not possible with pixel accessor.
If I have time tomorrow, I'll benchmark all three versions.
Please have a look at the patches before I continue with other primitive types
and let me know what do you think.
Viliam
Original comment by viliam.d...@gmail.com
on 12 Nov 2013 at 10:15
Attachments:
[deleted comment]
I just remembered, that step is always in bytes, it should be divided with
elemSize(). It's long I did not work with OpenCV. I will rewrite it later.
BTW, our computer vision project is still postponed, maybe for very long. I'm
sad of that, because it was very interesting for me and it would surely be very
good selling mobile application. And finally, we plan to use neither javacpp
nor javacv in the final product, due to performance. We do lot of pixel
processing, and comparable function in OpenCV written in C runs 10 times faster
than ours very similar in Java. That's on android, on desktop, the difference
is not so big (probably due to JIT). But they were very good for us for initial
prototypes, so I feel I owe to contribute...
Original comment by viliam.d...@gmail.com
on 12 Nov 2013 at 10:28
It's fine to use Buffer directly, but it does get complicated when accessing
many elements of a matrix. It's a lot easier if we can specify the indices as
we would in MATLAB for example.
Your API looks fine, thanks! Still, I would shorten the names to ByteAccessor,
etc instead of ByteArray..., BytePixel..., etc. because it's shorter and it
works for direct NIO Buffer objects as well, not just arrays or pixels, so it
removes potential confusion. And com.googlecode.javacpp.accessors (should this
be singular or plural?) is good as package name I think. Putting it in a
separate package also indicates that they may be used independently from the
classes in the main package, so good idea, thanks!
BTW, we can still use JavaCPP to call custom C/C++ functions. It's much nicer
than using raw JNI with the NDK, while imposing almost no additional overhead.
Original comment by samuel.a...@gmail.com
on 17 Nov 2013 at 2:08
Okay, so I'll finish the rest. I prefer "accessors" (plural). I also prefer
ByteArrayAccessor, instead of ByteAccessor, because it improves readability.
IDEs are to complete names. Anyway, I'll write "BAA" in eclipse, and I'll
probably get better match than with "BA". I even thought of
ByteNdArrayAccessor, but did not like this. But I'll do what you say, let me
know.
I'd like to prepare an article, where I will summarize performance
characteristics of each way, maybe on wiki. Then, I'd like to refer to that
from the javadocs. There I could summarize individual ways of accessing pixels
along with their benchmarks.
Original comment by viliam.d...@gmail.com
on 18 Nov 2013 at 6:03
Regarding the usage of JavaCpp in our project, we only have two JNI methods. I
never used JNI before, so it was a way to learn it and to be able to better
understand javacpp, because, due to the lack of documentation and no experience
with JNI it seemed complicated to me...
Original comment by viliam.d...@gmail.com
on 18 Nov 2013 at 6:06
I'm not sure it's a good idea to name "ByteArrayAccessor" something that
doesn't access a byte array (it's technically accessing a buffer in the
non-copied case at least)... What do you think?
And something I've just noticed from your code, but `ByteBuffer.put(byte[])` is
very very slow on Android, that's why I added `BytePointer.put(byte[])`:
https://code.google.com/p/javacpp/issues/detail?id=11
So, let's think a bit more about that, and your `copyBack()` method, before
going further.
I've added you as contributor, so please feel free to add a wiki page! Maybe
you'd prefer to add one to the JavaCPP site though?
If you have only two JNI methods, maybe JavaCPP isn't so useful yes.
Original comment by samuel.a...@gmail.com
on 24 Nov 2013 at 2:01
Regarding the issue 11, it seems that it is some improper buffer implementation
in android. That's why I even created the otherwise odd Copying version. Maybe
if I use pointer.get(int) method, that would be faster and make Copying version
useless. Have to do benchmark.
Regarding the name, what better name you have? We started with
BytePixelAccessor for CvMat, but then generalized it to all arrays, so
ByteArrayAccessors. Alternatives that come to my mind are:
ByteNdArrayAccesor
ByteNdElementAccessor
ByteNdArrayElementAccessor
ByteAccessor
ByteUtils
BytePointerUtil
BytePointerAccessor
ByteMdArrayAccessor
ByteMultiDimArrayAccessor
ByteElementAccessor
ElementAccessorByte
ArrayElemAccessorByte
ByteArrayElemAccessor
I think, the only self-describing is the longest. Any other would be explained
in javadoc.
Original comment by viliam.d...@gmail.com
on 25 Nov 2013 at 3:17
JNI calls are not as efficient as accessing array elements, so the copying
accessor makes sense. But we still need JNI to copy the whole array, something
like BytePointer.get(byte[])/put(byte[]), and we might as well use that.
So, the names could be ByteBufferAccessor and BytePointerAccessor. I think
that's pretty self-describing? Maybe some other name than "Accessor" could
clarify the meaning. Although I wouldn't want to lengthen the name further, so
I'm not sure what...
Original comment by samuel.a...@gmail.com
on 26 Nov 2013 at 6:25
Maybe we could name them "Indexer" since it's a concept already used in C# for
multidimensional arrays, but not in Java?
http://www.javacamp.org/javavscsharp/indexer.html
http://msdn.microsoft.com/en-US/library/2yd9wwz4%28v=vs.80%29.aspx
That single word carries both the meaning of Array and Accessor, so we could
shorten the names, i.e.: ByteIndexer...
Original comment by samuel.a...@gmail.com
on 30 Nov 2013 at 4:57
Okay, so here are complete patches with Indexer API. Apply both patches to
javacpp (the first adds "accessors", the second deletes it and adds "indexers",
I was lazy to do a clean clone...)
I liked the "Indexer" name most, so I completed it using this. I also did
JUnit-like test, but it is not included in patches, as you don't have JUnit set
up in your project.
I also did performance test, with interesting results:
Test configurations
1. jdk 1.6.0_41-64bit, athlon x2 64, 2-core 3GHz, run with -server switch
2. jdk 1.7.0_17-64bit, athlon x2 64, 2-core 3GHz, run with -server switch
3. Android 2.3.5, Qualcomm MSM8255 1-core 1GHz
(1) (2) (3)
sumBuffer 3507 3432 2 835 870
sumPointer 93962 93992 1 683 551
sumBufferCopy 5176 5275 132 824
sumPointerCopy 3824 4112 125 451
sumIndexer 3342 17884 3 329 522
sumIndexerCopy 4234 20491 742 822
cvSum 1885 1866 16 809
addBuffer 7688 7375 6 068 920
addPointer 207170 238679 3 636 462
addBufferCopy 9399 9423 137 023
addPointerCopy 6683 6488 135 833
addIndexer 7277 20948 7 160 247
addIndexerCopy 7628 8221 1 317 675
cvAddS 1986 1974 33 465
Test sources are in attached Test.java. It tests two operations (intensive to
pixel access). "Sum" adds all pixel values (only does reading), "add" modifies
pixels by incrementing by one. As you see, there are huge differences.
The "never do" version is accessing value through pointer element by element
(sumPointer and addPointer). Android's "never do" is to directly access native
pixels, but rather copy them. I did not reproduce the android slowness with
ByteBuffer.get(int[]) (compare sumBufferCopy and sumPointerCopy)
Indexer access is a bit slower than direct work with buffer or with copied
array on android, but it's code is the simplest. This might be due to
unnecessary method calls, and due to I cannot use premultiplied value the way I
do it when accessing buffer. Strange is, that it is lot slower on JDK 1.7...
Not surprisingly, C-version is always fastest (1.7-3.5 times on HotSpot and 4-7
times on Android. And it is not dependent on JIT/GC mood :)
I have not a device with newer android handy, can you do a test with that? I
expect, this problem will disappear (or maybe disappeared) one day.
My final recommendations is to always prefer C. Use JavaCV only if:
- you don't need to access pixels, if you only call OpenCV's bulk operations
(such as a sequence of various image transformations)
- you need portability (but you will need native libraries anyway)
- you are prototyping (i.e. less segfaults in java)
- your algorithm is fast enough
If you decide to use pixel access, use these method:
Android:
1. access copied data manually
2. use copying indexer for less code and slower speed (6-10 times)
On JDK
1. access buffer manually
2. use direct indexer for less code and slower speed (may not be slower, but
may be up to 5 times slower). Run on a good JRE :).
3. the more times you access the same pixel, the more prefer copying the array
General:
Always create a copy using Pointer.get(), not Buffer.get().
---
The above could be a basis for a wiki page I recommend to write. We could link
to it in the sources.
Please review the patches, and, if you can, try to review the benchmark too, if
I have not done something wrong.
Viliam
Original comment by viliam.d...@gmail.com
on 11 Dec 2013 at 9:51
Attachments:
I accidentaly got an Nexus 10 with Android 4.4 handy with ARM Cortex A15 2-core
1.7 GHz, here are the results:
sumBuffer 965 584
sumPointer 5 105 642
sumBufferCopy 34 331
sumPointerCopy 34 406
sumIndexer 1 151 433
sumIndexerCopy 165 345
cvSum 3 335
addBuffer 2 128 167
addPointer 11 186 691
addBufferCopy 35 940
addPointerCopy 35 791
addIndexer 2 574 975
addIndexerCopy 410 906
cvAddS 8 068
Still no change, buffer element access is slow, buffer bulk access is as fast
as pointer's, and C is 10 times faster (in this case opencv cvSum or cvAddS
might even used multiple cores, did not check the code). Still, copying is lot
faster.
Original comment by viliam.d...@gmail.com
on 11 Dec 2013 at 10:47
[deleted comment]
Great, thanks! It's going to take me a while to review that. It's actually
important functionality I think, so I want to make sure we get the API right. I
have a couple of comments for now though:
1. Please do provide patches that apply cleanly to the HEAD of master branches
(no need right now though)
2. Are you OK with licensing that under the GPLv2 with Classpath exception? Or
would you prefer something else?
3. Maven supports unit testing if we place files in the right directories, so
feel free to put stuff there
4. You seem to provide a couple of unrelated fixes for opencv_core.java. Are
all the changes that are in there indeed fixes?
5. Have you tried to write data back to native memory? Direct NIO buffers on
Android should be pretty slow in that case.
Thanks for everything!
Original comment by samuel.a...@gmail.com
on 15 Dec 2013 at 7:49
1. Patches should apply to the HEAD. There are just two patches, because I did
it in two commits
2. Yes
3. I'll try, hopefully next week.
4. There is only one unrelated fix for CvMat.isContinuous(). It always returned
false, because it used type() instead of raw_type(), and type() has some bits
cleared, including the continuous bit.
5. The "sum" benchmark does not write back, it only reads pixels. The "add"
benchmark does (it modifies pixels).
Original comment by viliam.d...@gmail.com
on 15 Dec 2013 at 11:16
5. Does this mean both Buffer.put(array)/get(array) work efficiently on
Android? I wonder why Adam found different:
https://code.google.com/p/javacpp/issues/detail?id=11
Original comment by samuel.a...@gmail.com
on 15 Dec 2013 at 1:04
I reproduced issue 11. I actually wonder, why it does not reproduce in my test
case. Maybe because I've tested only ByteBuffer, but in issue 11 they have
problem with floats or ints, I will have to give it a try...
Anyway, I use Pointer's bulk methods for copying in the Copying indexers.
Original comment by viliam.d...@gmail.com
on 15 Dec 2013 at 1:26
Ok, so let's take a step back. "NIO" stands for "New I/O", and the intention
was that we should be able to do everything and do it efficiently using only
*Buffer objects, and nothing else, but on Android this isn't the case, yet. So
we're left to hack things up, for now. And your original proposal was optimal
in that sense.
What would you think of the following then. We have only Indexer per type
(ByteIndexer, ShortIndexer, etc) and they all accept a Buffer object of the
corresponding type. But we also provide a flag to indicate whether we want to
use it in "copying" mode. (BTW, it's possible to create a Pointer from a
Buffer. That's what I'm doing in FFmpegFrameRecorder.java for example.) And
eventually when Android gets better, this flag should become obsolete, but at
least we won't be stuck with a hackish API.
How does that sound?
Original comment by samuel.a...@gmail.com
on 16 Dec 2013 at 3:56
Sounds good. But the normal use case for copying version would be the following:
CopyingByteIndexer ix = (CopyingByteIndexer) ByteIndexer.createXxx(..., true);
// doing work...
// ix.copyBack();
If we would want to silently ignore the copying flag, we would not return
CopyingByteIndexer instance anymore. So I recommend to make Copying- and
Direct- subclasses package-visible (i.e. default visibility) and put the
copyBack() to the abstract method, which would be no-op for Direct- version. It
will be cleanly stated in the documentation, when we need to call copyBack().
I did tests with IntPointer/IntBuffers, here are the results:
(1) (2) (3)
sumBuffer 1673 1781 1055676
sumPointer 23049 22984 452716
sumBufferCopy 4172 4168 1081054
sumPointerCopy 3026 3026 40960
sumIndexer 1612 5785 1180523
sumIndexerCopy 4137 6935 302331
cvSum 1399 1364 9222
addBuffer 2325 2289 2119384
addPointer 55452 48051 865838
addBufferCopy 7648 7529 1084564
addPointerCopy 4990 4744 45794
addIndexer 2230 6889 2490893
addIndexerCopy 7215 7943 448577
cvAddS 1778 1870 11267
Now the slowness of bulk buffer get is reproduced (the issue 11 problem). And
pointer copy takes 2/3 of the time of buffer copy even on JDK, so it's always
faster.
What's interesting, with bytes, the addPointerCopy was faster than addBuffer.
With ints, it's the opposite. Now it's very tough to give a general rule of
what's faster. I can think of these:
- on Android, always work on a copy. On JDK, do your own test
- work manually with a buffer or a copy for better speed
- use indexer for shorter but slower code
Tomorrow, I'll rewrite the code to always use buffer parameter. And what do you
think about hiding the copying and direct subclasses? Anyway, I don't think we
will be able to remove copying version anytime sooner than in 15-20 years :)
Viliam
Original comment by viliam.d...@gmail.com
on 16 Dec 2013 at 5:54
Well, I don't know about the subclasses either. In the case of a direct NIO
buffer, we can copy its content to a non-direct NIO buffers. We would basically
use the two-class pattern at the Buffer level. No need to duplicate it a level
higher, if it's as efficient that way. I'm pretty sure that would be as
efficient as an array, even on Android, but we should test this out.
BTW, when comparing with optimized functions like cvSum(), be sure to optimize
the code in Java as well, by unrolling loops, etc:
http://stackoverflow.com/questions/10784951/do-any-jvms-jit-compilers-generate-c
ode-that-uses-vectorized-floating-point-ins/10809123#10809123
Original comment by samuel.a...@gmail.com
on 16 Dec 2013 at 6:12
Not sure about your 15~20 years estimate either
http://www.extremetech.com/computing/170677-android-art-google-finally-moves-to-
replace-dalvik-to-boost-performance-and-battery-life
:)
Original comment by samuel.a...@gmail.com
on 16 Dec 2013 at 6:19
Here are results for Android with unrolled loop, compared to C
(1) android 2.2 1-core 1GHz with unrolled loops
(2) same as above, without unrolling
(1) (2)
sumBuffer 981770 1055676
sumPointerCopy 31024 40960
cvSum 9002 -
As you see, a bit better, but not a lot. Maybe the Android's JIT does automatic
unrolling, but still not comparable to C. So still no reason to use java on
android if you need performance. I checked the source of cvSum, they explicitly
unroll 4 times in case of 1-channel images. In others, they just do simple loop
(and the compiler might do automatic unroll).
I also tried the the unrolled version on HotSpot, the difference is negligible:
(1) with unrolled loops, jdk 1.6.0_41-64bit, athlon x2 64, 2-core 3GHz, run
with -server switch
(2) same as above, without unrolling
(1) (2)
sumBuffer 1696 1673
sumPointerCopy 2950 3026
cvSum 1453 -
Note, that I measure the the times as an average of 5 runs, after 15 "heat up"
runs, with the -server JVM switch (-server causes lower threshold for the JIT
to take place). Try to do this with your example at
http://stackoverflow.com/a/10809123/952135, probably openjdk's JIT does not
unroll automatically.
Copying still helps a lot, even on android 4.4 (see comment 35). I will test
the ART and will let you know, maybe then we can disable copying version in 10
years, when non-art is obsolete...
Tomorrow, I'll rewrite indexers to your previous notes.
Original comment by viliam.d...@gmail.com
on 18 Dec 2013 at 7:44
Well in any case, I think we should compare with the same loop in C++ instead
of calling cvSum(), just to be fair.
To be clear, I see the constructor of ByteIndexer doing something like this:
ByteIndexer(Buffer buffer, boolean cached) {
if (cached) {
this.buffer = this.cache = ByteBuffer.allocate(buffer.size());
this.pointer = new Pointer(buffer);
} else {
this.buffer = buffer;
}
}
With a couple of helper methods:
void fillCache() {
pointer.get(cache.array());
}
void flushCache() {
pointer.put(cache.array());
}
Seems like that would be the best things to do...
Original comment by samuel.a...@gmail.com
on 18 Dec 2013 at 11:52
Tried to implement suggestions from comment 46, but failed due to slower
performance:
1. Using heap buffer instead of array is slower:
(1) Copying version with buffer
(2) Copying version with array
(1) (2)
sumIndexerCopy 6348 4137
addIndexerCopy 10934 7215
Tested only on HotSpot 1.6, but I have not doubt it will be slower on Android
2. Using buffer to initialize indexers is slower in case of many small
matrices. It does unnecessary conversion from pointer to buffer and back to
pointer. We can do about 1000 such conversions per ms on my 3GHz Athlon x2. On
android, it will be definitely longer.
CvMat (and probably all JavaCPP-mapped classes) return their arrays as
Pointers. I don't think the API from latest patch is "hackish". It is part of
JavaCPP and supposed to simplify iterating multidimensional arrays, and it is
created from pointer, that is commonly used by JavaCPP.
3. I renamed copyBack() to flushCache(), and added fillCache() method, moved
that to the base class and renamed the "copying" flag to "cached". Made
subclasses package-visible.
4. I added JUnit test.
Original comment by viliam.d...@gmail.com
on 19 Dec 2013 at 8:17
Attachments:
Could you provide the code that gave you these results? (BTW, no need to update
all your classes until we decide on the final API...)
Running the attached TestIndexer on my gear here (java version "1.7.0_45"
OpenJDK fedora-2.4.3.0.fc19-x86_64 u45-b15) outputs this:
-249231677 171
-249231677 209
-249231677 219
-249231677 223
-249231677 233
-249231677 223
-249231677 167
-249231677 167
Doesn't look like there's a clear winner: Both are fast. What do you get?
Original comment by samuel.a...@gmail.com
on 20 Dec 2013 at 2:38
Attachments:
Try to run more iterations, and add -server switch to the jvm (don't know if
openjdk supports it). The aim is to JIT to take place. I dont have my code, I
rewrote the indexer and ran test from comment 34. After seeing that it is
slower, I reverted it.
Original comment by viliam.d...@gmail.com
on 20 Dec 2013 at 5:48
[deleted comment]
Original issue reported on code.google.com by
viliam.d...@gmail.com
on 13 May 2013 at 11:25