marschall / memoryfilesystem

An in memory implementation of a JSR-203 file system
282 stars 36 forks source link

Unable to use memoryfilesystem with Lucene for testing #113

Closed MichaelKunze closed 5 years ago

MichaelKunze commented 5 years ago

Trying to test a very simple add(Lucene)Document throws the following exception:

Exception in thread "main" java.lang.UnsupportedOperationException: memory file system does not support mmapped IO
    at com.github.marschall.memoryfilesystem.BlockChannel.map(BlockChannel.java:166)
    at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:267)
    at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:242)
    at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:100)
    at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:100)
    at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:100)
    at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
    at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.write(Lucene50CompoundFormat.java:89)
    at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4997)
    at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:576)
    at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:515)
    at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:554)
    at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:719)
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3602)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3577)
    at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1035)
    at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1078)
    at Demo.main(Demo.java:20)

Code that produces the exception:

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;

import com.github.marschall.memoryfilesystem.MemoryFileSystemBuilder;

public class Demo {

    public static void main(String[] args) throws Exception {

        final var fileSystem = MemoryFileSystemBuilder.newEmpty().build();
        final var indexPath = fileSystem.getPath("/index");

        try (var directory = FSDirectory.open(indexPath);
             var indexWriter = new IndexWriter(directory, new IndexWriterConfig(new StandardAnalyzer()))
        ) {
            indexWriter.addDocument(new Document());
        }

    }

}

Uses:

Any ideas how to fix this? Thanks!

marschall commented 5 years ago

It should work with the following code

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSLockFactory;
import org.apache.lucene.store.SimpleFSDirectory;

import com.github.marschall.memoryfilesystem.MemoryFileSystemBuilder;

public class Demo {

    public static void main(String[] args) throws Exception {

        final var fileSystem = MemoryFileSystemBuilder.newEmpty().build();
        final var indexPath = fileSystem.getPath("/index");

        try (var directory = new SimpleFSDirectory(indexPath, FSLockFactory.getDefault());
             var indexWriter = new IndexWriter(directory, new IndexWriterConfig(new StandardAnalyzer()))
        ) {
            indexWriter.addDocument(new Document());
        }

    }

}

I'll see whether I can get this fixed in Lucene upstream.

MichaelKunze commented 5 years ago

Thanks for the quick reply. I see what you did there. It almost works. IntelliJ has assertions for tests enabled by default. Therfore i run into (with -ea enabled):

Exception in thread "main" java.lang.AssertionError: On Linux and MacOSX fsyncing a directory should not throw IOException, we just don't want to rely on that in production (undocumented). Got: java.nio.file.FileSystemException: /index: file is a directory
    at org.apache.lucene.util.IOUtils.fsync(IOUtils.java:464)
    at org.apache.lucene.store.FSDirectory.syncMetaData(FSDirectory.java:310)
    at org.apache.lucene.store.LockValidatingDirectoryWrapper.syncMetaData(LockValidatingDirectoryWrapper.java:62)
    at org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java:771)
    at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4773)
    at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3288)
    at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3456)
    at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1037)
    at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1078)
    at Demo.main(Demo.java:21)

Happens in MemoryFileSystem:372. Is it suppose to do that? Without "-ea" it's fine.

marschall commented 5 years ago

It's tricky. The current behavior of OpenJDK on Linux and macOS (I don't know about Unixes) is that in order to fsync() a directory you need to open a FileChannel for reading, not read anything and then call force().

Unfortunately this is completely unspecified behavior. In addition while you can open a FileChannel for reading actually reading from it will fail.

See the following threads and issues:

http://mail.openjdk.java.net/pipermail/nio-dev/2015-January/002979.html (goes on until May)

https://bugs.openjdk.java.net/browse/JDK-8066915 https://bugs.openjdk.java.net/browse/JDK-8080629 https://issues.apache.org/jira/browse/LUCENE-6169