marschall / memoryfilesystem

An in memory implementation of a JSR-203 file system
282 stars 36 forks source link

Support unregistered filesystems #95

Closed retronym closed 6 years ago

retronym commented 6 years ago

Requiring the user to name the filesystem enables the provider to maintain the map of active filesystems, and in turn implement FileSystemProvider.getPath and similar methods that convert from URIs to paths.

The downside is that the user is responsible for uniquely naming the filesystem, and for remembering to close it in order to free memory.

ZipFileSystem implements FileSystem newFileSystem(Path path, Map<String, ?> env) for use cases where it isn't feasible or desirable for the clients to coordinate the names and lifecycles of a shared file system.

Could memoryfilesystem implement this method too, with the caveat that some the URI based APIs would be unsupported?

As a comparison, jimfs only seems to support unnamed file systems.

retronym commented 6 years ago

A (hacky) workaround is to reflect on MemoryFileSystemProvider.fileSystems and remove the entry just after the filesystem is created. I have tested that this removes the only reference to the filesystem from the provider by testing that a loop that creates a provider and writes files doesn't exhaust the heap.

marschall commented 6 years ago

This has indeed already come up. Right now I'm leaning towards generating a unique name rather than not supporting some URI methods.

retronym commented 6 years ago

Let me share my use case.

I'm in the process of refactoring the IO implementation of the Scala compiler to use NIO pervasively, rather than the existing IO abstraction.

Some parts of our implementation use an in-memory filesystem as either an output target for generated classfiles, or as an element of the compiler classpath, or as the element of the Classloader. This supports programmatic compilation without needing to touch disk.

Not all API's the expose these features have a close methods, so I don't have a spot to close the filesystem and avoid a memory leak, short of using finalizers or the like. I'd prefer not create the global reference to the filesystem in the first place, as we don't need the URI methods to work in this part of our code.

Beyond this, I like the prospect of putting all our IO through NIO in a way that would let us run our test suites in-memory with the ability to switch OS-specific behavior, as per the advertised use case of this library. Ideally, I'd like to find a single library to serve both use cases.

marschall commented 6 years ago

I don't think I have yet fully understood your problem.

Somehow you have to define the life time of a file system, at least conceptually. When will a file system no longer be accessed and can be discarded and when do you expect to get a new, clean instance? Can the lifetime somehow be determined at a higher level (eg. a compilation task) which manages the creation and closing of file systems and then pass FileSystem or Path instances to lower level so they don't have to bother with lifetimes?

FileSystems#newFileSystem(Path path, Map<String, ?> env) is specified to throw a FileSystemAlreadyExistsExcetpion so changing this to return an existing instance instead of creating one would violate the API.

It may be easier to look at different use cases individually.

For tests many approaches are possible. The more you share the more likely you're going to have conflicts with some tests influencing others. This is even more the case then you run tests in parallel.

As for programmatic compilation I don't know how often this can happen at runtime. If this can potentially happen at any time I may make sense to keep the instance of the file system around for the lifetime of the Scala runtime and delete files when they're no longer needed.

Ideally, I'd like to find a single library to serve both use cases.

I have to say that memoryfilesystem makes many tradeoffs for testing. For runtime in memory instances these tradeoffs would likely have to made for better performance and lower footprint.

retronym commented 6 years ago

If the filesystem is nothing more than an object graph, the only cleanup required is to release the reference to particular instance of FileSystem (or any Path created from it). The file contents are then eligible for GC.

We have an existing API which is essentially:

Callable compileCode(s: String)

Which compiles the code snippet to an virtual FS, creates a classloader based on this, and lets the user run the code. We would need to change this API to return an intermediate object to allow close. That is probably a better design, but we're bound by the constraints of backwards compatiblity.

I can't see where the docs of that overload of newFileSystem talks about throwing the FSAEE, but I agree it would be a bit non-standard, as you would be basically ignoring the given path, unlike ZipFSP, which accepts the ZIP as the path.

My use case would be served by just having a custom method on MemoryFileSystemProvider that by-passes the entry into the map, but I appreciate that exposing this might confuse other users and you might want to keep this library focused.

marschall commented 6 years ago

If the filesystem is nothing more than an object graph, the only cleanup required is to release the reference to particular instance of FileSystem (or any Path created from it). The file contents are then eligible for GC.

True, except for the registration in the FileSystemProvider to support PathURIPath.

My use case would be served by just having a custom method on MemoryFileSystemProvider that by-passes the entry into the map

So that it would become eligible for GC once all the references are gone and you don't need the URI related methods. Generating a unique name wouldn't work for you because then registration would still happen. Got it. I haven't decided whether I want to support this.

Callable compileCode(s: String)

I assume the contract allows for repeated execution, otherwise you could just clean up after the first execution. Once you have the compiled byte code I assume you create a ClassLoader on the file system. Strictly speaking in that case the file system does not become eligible for GC until class unloading happens. Have you considered instead copying the bytecode into a Map<String, byte[]> creating a ClassLoader around that and closing the file system?

Regarding FSAEE it seems this was changed from 8 to 9 https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystems.html#newFileSystem-java.net.URI-java.util.Map- 😒 I may have to revisit this.

retronym commented 6 years ago

Have you considered instead copying the bytecode into a Map<String, byte[]> creating a ClassLoader around that and closing the file system?

That would certainly work in this case. We have some other cases that aren't quite as clean, my hope is to just code against Path everywhere and have the virtual FS transparently handle our in-memory use cases.

marschall commented 6 years ago

I'm sorry to inform you that I have decided to not support this feature. I hate turning users away, especially large ones. The change required feels too much like a kludge that would only work in your special case. A filesystem is a Closeable resource and the onus is on the creator to #close() it. I'm aware this sounds like "no you, fix your damn code". It's not meant that way.

retronym commented 6 years ago

Thanks for weighing up the pros and cons. I appreciate that it is important to say "no" sometimes to adhere to your design goals.