kaitai-io / kaitai_struct_java_runtime

Kaitai Struct: runtime for Java
MIT License
42 stars 18 forks source link

Started separation of API and implementation for KaitaiStream #13

Closed GreyCat closed 7 years ago

GreyCat commented 7 years ago

By a reasonable request by @r4ravi2008, I've started separation of KaitaiStream into clear API definition + implementation. Right now, there's only one implementation: ByteBufferKaitaiStream (obviously backed by ByteBuffer), but:

There are some issues, of course.

fromFile helper method

This would need some changes in the compiler and/or in tests as well. Right now, compiler adds a fromFile static helper method to almost every class, and this obviously should invoke a concrete class:

    public static DosMz fromFile(String fileName) throws IOException {
        return new DosMz(new KaitaiStream(fileName));
    }

Simplest way to solve that would be to replace that with new ByteBufferKaitaiStream, but then we come to...

Tests

Current tests extensively use that fromFile method. If we want to be able to test different implementations, then we need somehow to substitute that. Probably the simplest way for that would be something like CLI switch --java-from-file-class=ByteBufferKaitaiStream or --java-from-file-class=HadoopKaitaiStream, that will influence generation. Of course, HadoopKaitaiStream will have to implement a string-argument constructor, that will somehow find relevant test files in that Hadoop FS (and somebody has to put them there beforehand, of course).

Substreams

Doing substreams in Java is now implemented as reading byte array and constructing a new stream for that byte array:

        this._raw_foo = this._io.readBytes(200);
        KaitaiStream _io__raw_foo = new KaitaiStream(_raw_foo);
        this.foo = new Foo(_io__raw_foo, this, _root);

Obviously, that should require a concrete class as well. Again, simplest way would be to just replace it with new ByteBufferKaitaiStream, but, in theory, a right way would be to implement something like readSubstream method, that actually hides that functionality into the runtime, and it can be implemented differently for every concrete stream. For example, there even could be something like LimitedKaitaiStream, which wraps existing KaitaiStream, mapping appropriate byte positions and checking size limits (no sure, though, that it would be most efficient way to solve this).


I'm asking everyone concerned to review if that's a viable idea or not.

Cc @tschoening @LogicAndTrick

ams-tschoening commented 7 years ago

Sounds good to me, especially readSubstream sounds like a good idea and should be introduced at this point.

GreyCat commented 7 years ago

I've ported older RandomAccessFile-based solution as RandomAccessFileKaitaiStream, it should be now in this pull request as well.

Also, I've implemented complimentary support for this in the compiler: see this branch.

Now one can run tests with either BB- and RAF-based runtimes, with only adding --java-from-file-class=io.kaitai.struct.RandomAccessFileKaitaiStream to select the latter. Surprisingly, it even works — except for minor problem with exceptions: BB-based exception throws BufferUnderflowException (which tests currently expect), and RAF-based throws EOFException.

Probably we should add KS-specific wrappers for these exceptions (without checked exceptions, as it seems to be more or less universal agreement to ditch them now).

r4ravi2008 commented 7 years ago

@GreyCat Thanks for getting this PR up so fast~

GreyCat commented 7 years ago

@r4ravi2008 So, what's your opinion? Would it work for your implementation?

r4ravi2008 commented 7 years ago

@GreyCat abstract public int pos();should this return long instead?

LogicAndTrick commented 7 years ago

For fromFile stuff: I'm not super familiar with Java APIs in this space, is there a common class that all the stream implementations will inherit from? I know in .NET there's the Stream type which is universally used, something like fromStream(Stream str) would fit well in that scenario. Then the consumer could choose which type of stream they want to pass in.

GreyCat commented 7 years ago

I'm not super familiar with Java APIs in this space, is there a common class that all the stream implementations will inherit from?

Unfortunately, stdlibs IO stuff in Java is, I might say, extremely weird. It's very overengineered, with tons of abstractions that do not map to real-life tasks all that well. There are several timeframes when stdlib curators pursues different goals and ideologies. There are that "checked exceptions" vs "unchecked exceptions" discourse, "stream concept" vs "reader/writer concept" discourse, "old IO" vs "new IO", sync vs async, etc, etc. If there was one solution that fits all, we would have used it since the very beginning.

I believe the overall consensus on this PR is that it's ok to merge, so I'm going to do it now.