xerial / larray

Large off-heap arrays and mmap files for Scala and Java
Apache License 2.0
400 stars 43 forks source link

30x slowdown compared to JDK 7 mmap #45

Closed Andrei-Pozolotin closed 8 years ago

Andrei-Pozolotin commented 10 years ago

hi.

the following microbenchmark produces, while reading 700+ MB file, 30x slowdown compared to JDK 7:

Mapped Read 1: size=780063457
millis=11578
Mapped Read 2: size=780063457
millis=347

code

package com.carrotgarden.test;

import java.io.*;
import java.nio.*;
import java.nio.channels.*;

import xerial.larray.LArray;
import xerial.larray.LIterator;
import xerial.larray.MappedLByteArray;
import xerial.larray.japi.LArrayJ;
import xerial.larray.mmap.MMapMode;

public class Mapped02 {

    static String SOURCE = "/home/user1/Downloads/___/BCA32C173CC2C4B545F75D1046B7206F.txt";
    static String TARGET = "/home/user1/Downloads/___/BCA32C173CC2C4B545F75D1046B7206F.txt.sz";

    private abstract static class Tester {

        private String name;

        public Tester(String name) {
            this.name = name;
        }

        public long runTest() {
            System.out.print(name + ": ");
            try {
                long timeStart = System.currentTimeMillis();
                test();
                long timeFinish = System.currentTimeMillis();
                return (timeFinish - timeStart);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }

        public abstract void test() throws IOException;
    }

    private static Tester[] TEST_LIST = { //
    new Tester("Mapped Read 1") {
        public void test() throws IOException {

            MappedLByteArray array = LArrayJ.mmap(new File(TARGET),
                    MMapMode.READ_ONLY);

            array.init();

            long size = array.size();
            System.out.println("size=" + size);

            for (long index = 0; index < size - 4; index += 4) {
                array.getInt(index);
            }

            array.close();

        }
    }, new Tester("Mapped Read 2") {
        public void test() throws IOException {
            FileChannel fc = new FileInputStream(new File(TARGET)).getChannel();
            MappedByteBuffer buffer = fc.map(FileChannel.MapMode.READ_ONLY, 0,
                    fc.size());

            long size = fc.size();
            System.out.println("size=" + size);

            for (int index = 0; index < size - 4; index += 4) {
                buffer.getInt(index);
            }

            fc.close();
        }
    } };

    public static void main(String[] args) {
        for (int i = 0; i < TEST_LIST.length; i++)
            System.out.println("millis=" + TEST_LIST[i].runTest());
    }
}

using

        <dependency>
            <groupId>org.xerial.snappy</groupId>
            <artifactId>snappy-java</artifactId>
            <version>1.1.1-M1</version>
        </dependency>
        <dependency>
            <groupId>org.xerial.larray</groupId>
            <artifactId>larray</artifactId>
            <version>0.2.1</version>
        </dependency>

on

Linux wks002 3.2.0-61-generic-pae #93-Ubuntu SMP Fri May 2 21:46:08 UTC 2014 i686 i686 i386 GNU/Linux

java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) Server VM (build 24.55-b03, mixed mode)
xerial commented 10 years ago

Hi @Andrei-Pozolotin

LArray uses long-based indexes for getXXX(long index) so it has some overhead compared to the API that uses int indexes. I know this problem and have a plan to provide int-based API #42.

I am not sure what you are comparing between the two tests. One of the tests calls init() to create a slice without the last element of the mmapped region, while the other does not. And also your benchmark includes the costs of JVM warm-up, mmap open and synchronization, etc.