scijava / scijava-common

A plugin framework and application container with built-in extensibility mechanism :electric_plug:
BSD 2-Clause "Simplified" License
91 stars 52 forks source link

Extremely low speed of ReadBufferDataHandle #467

Closed Daniel-Alievsky closed 1 year ago

Daniel-Alievsky commented 1 year ago

I tried to use ReadBufferDataHandle to speed up parsing TIFF, instead of simple FileHandle in my analog of TiffParser. And I was very surprised that the speed did not increase.

Then I've created the following very simple test, which reads first 250000 of 4-byte integer values from some test file:

public class ReadArraySpeed {
    public static void main(String[] args) throws IOException, FormatException {
        if (args.length < 1) {
            System.out.println("Usage:");
            System.out.println("    " + ReadArraySpeed.class.getName() + " any_file");
            return;
        }

        final Location location = new FileLocation(new File(args[0]));
        for (int test = 1; test <= 5; test++) { // - helps to warm JVM
            System.out.printf("Test #%d...%n", test);
            try (Context context = new Context();
                DataHandle<Location> in = context.getService(DataHandleService.class).create(location);
                 DataHandle<Location> inBuffer = context.getService(DataHandleService.class).readBuffer(location)) {
                final int elementSize = 4;
                final int size = (int) (Math.min(in.length(), 1_000_000) / elementSize);
                int[] a = new int[size];
                int[] b = new int[size];
                long t1 = System.nanoTime();
                in.seek(0);
                for (int k = 0; k < a.length; k++) {
                    a[k] = in.readInt();
                }
                long t2 = System.nanoTime();
                inBuffer.seek(0);
                for (int k = 0; k < b.length; k++) {
                    b[k] = inBuffer.readInt();
                }
                long t3 = System.nanoTime();
                System.out.printf(Locale.US, "Reading %d 32-bit integers, %s: %.3f ms, %.3f MB/s%n",
                        size, in.getClass(),
                        (t2 - t1) * 1e-6, (size * elementSize) / 1048576.0 / ((t2 - t1) * 1e-9));
                System.out.printf(Locale.US, "Reading %d 32-bit integers, %s: %.3f ms, %.3f MB/s%n",
                        size, inBuffer.getClass(),
                        (t3 - t2) * 1e-6, (size * elementSize) / 1048576.0 / ((t3 - t2) * 1e-9));
            }
        }
    }
}

On my computer, results are following: Test #1... Reading 250000 32-bit integers, class org.scijava.io.handle.FileHandle: 317.620 ms, 3.003 MB/s Reading 250000 32-bit integers, class org.scijava.io.handle.ReadBufferDataHandle: 2609.762 ms, 0.365 MB/s Test #2... Reading 250000 32-bit integers, class org.scijava.io.handle.FileHandle: 336.619 ms, 2.833 MB/s Reading 250000 32-bit integers, class org.scijava.io.handle.ReadBufferDataHandle: 2769.079 ms, 0.344 MB/s Test #3... Reading 250000 32-bit integers, class org.scijava.io.handle.FileHandle: 302.576 ms, 3.152 MB/s Reading 250000 32-bit integers, class org.scijava.io.handle.ReadBufferDataHandle: 2784.298 ms, 0.343 MB/s Test #4... Reading 250000 32-bit integers, class org.scijava.io.handle.FileHandle: 307.343 ms, 3.103 MB/s Reading 250000 32-bit integers, class org.scijava.io.handle.ReadBufferDataHandle: 2815.823 ms, 0.339 MB/s Test #5... Reading 250000 32-bit integers, class org.scijava.io.handle.FileHandle: 317.017 ms, 3.008 MB/s Reading 250000 32-bit integers, class org.scijava.io.handle.ReadBufferDataHandle: 3378.953 ms, 0.282 MB/s

You see that ReadBufferDataHandle works... not only not faster, but much slower than the simple DataHandle. Debugger shows that it performs a lot of calls of methods of low-level RandomAccessFile class, like exists(), length(). Maybe it is the reason.

Could you cleanup ReadBufferDataHandle to make it really quick while the simple sequential reading?

For comparison, if you will replace "readInt()" call with "readByte()", you will see the expected result: ReadBufferDataHandle works much faster. This method is implemented in much more simple way and is really efficient. But reading single byte is a rare case, usually we need to read some other types or short buffers.

ctrueden commented 1 year ago

Thanks for the report, @Daniel-Alievsky. I pushed a fix.

Daniel-Alievsky commented 1 year ago

Thank you! Are you going to release it in a new maven version?

ctrueden commented 1 year ago

I released scijava-common 2.95.1. Should appear on Maven Central soon.

Daniel-Alievsky commented 1 year ago

I've switched to this version in my project by explicit declaration:

        <dependency>
            <groupId>org.scijava</groupId>
            <artifactId>scijava-common</artifactId>
            <version>2.95.1</version>
        </dependency>

Hope this will become built-in dependence of the new version of main scifio project.

New speed results on my computer are much better:

Test #5... Reading 250000 32-bit integers, class org.scijava.io.handle.FileHandle: 299.633 ms, 3.183 MB/s Reading 250000 32-bit integers, class org.scijava.io.handle.ReadBufferDataHandle: 3.883 ms, 245.596 MB/s Reading 1000000 bytes, class org.scijava.io.handle.FileHandle: 1116.541 ms, 0.854 MB/s Reading 1000000 bytes, class org.scijava.io.handle.ReadBufferDataHandle: 7.650 ms, 124.655 MB/s

In the real situation, while reading IFDs from well-known OpenSlide example CMU-1.svs, simple replacing "this.in = in" in the constructor with this.in = in instanceof ReadBufferDataHandle ? in : new ReadBufferDataHandle<>(in); reduces the time of reading 6 IFDs from > 1000 ms until several milliseconds.

Thank you very much, it is a very good improvement!

Is it possible to ask you also to make improvements https://github.com/scifio/scifio/issues/508 for this package? Maybe, you will find better way to create FileHandle/BytesHandle without contexts, not using unsafe casts, if you will rework generics architecture of this package ?

And, of course, fixing bug with support of JPEG-2000 is still very actual ( https://github.com/scifio/scifio/issues/495 )...

ctrueden commented 1 year ago

@Daniel-Alievsky You're welcome—I'm happy to hear of your good experience with the fix. :smile:

About the SCIFIO fixes: I'm sorry, but I have zero time at the moment to work on the SCIFIO codebase. You have filed a lot of issues, which I appreciate for keeping track of problems. But realistically, no one on my team is likely to have time to look into them for 12 months or more. I just want to be clear about that. I am planning to return to SCIFIO eventually, but in the short term there are too many other urgent deadlines.

Daniel-Alievsky commented 1 year ago

I understand situation, and I hope most of my issues will be resolved automatically as a part of my work over new TiffReader/TiffWriter in https://github.com/scichains/scichains-maps When I'll finish, you will be able just to import my new code into SCIFIO project - it will not require essential time. Moreover, I'll be glad to do this myself, if you will explain how can I do this :)

The only a few issues cannot be resolved as a part of my work over TiffReader/TiffWriter. One of mostly serious is https://github.com/scifio/scifio/issues/495 In Java 9, your way of registration of JPEG 2000 became deprecated and doesn't work. Of course, support of JPEG 2000 is an important part of API, it would be very well if you will find a little time to resolve this.