Open 5k3105 opened 7 years ago
Thanks for the suggestion! Let's go with our checklist. I presume you're acquainted with KS, so going to section 2. What would be Go equivalents of the following:
What are the basic Go coding standards / universally accepted practices that we'll follow (also as per checklist, section 2.4)?
just taking a stab at this --
KS type
:
https://golang.org/ref/spec#Numeric_types
uint8 the set of all unsigned 8-bit integers (0 to 255)
uint16 the set of all unsigned 16-bit integers (0 to 65535)
uint32 the set of all unsigned 32-bit integers (0 to 4294967295)
uint64 the set of all unsigned 64-bit integers (0 to 18446744073709551615)
int8 the set of all signed 8-bit integers (-128 to 127)
int16 the set of all signed 16-bit integers (-32768 to 32767)
int32 the set of all signed 32-bit integers (-2147483648 to 2147483647)
int64 the set of all signed 64-bit integers (-9223372036854775808 to 9223372036854775807)
float32 the set of all IEEE-754 32-bit floating-point numbers
float64 the set of all IEEE-754 64-bit floating-point numbers
complex64 the set of all complex numbers with float32 real and imaginary parts
complex128 the set of all complex numbers with float64 real and imaginary parts
byte alias for uint8
rune alias for int32
also take a look at https://github.com/virtao/GoTypeBytes/blob/master/typeBytes.go for conversion from and to these types directly from bytes.
KS 'stream': https://golang.org/pkg/bytes/ https://golang.org/pkg/io/ https://golang.org/pkg/os/
I believe these are the packages required but I have not experimented with this yet :)
a collection of fields parsed sequentially (seq)
A slice is a dynamically sized array of type. https://golang.org/ref/spec#Slice_types
fields := []int{1,2,3}
a number of fields parsed randomly or calculated (instances)
Not sure what you want here. There is a key value hash map in Go. So if key is 'field' then:
fields := make(map[string]interface{})
this would create a kv map with a string as the key and interface
('any' type that can be cast to correct type later). https://golang.org/ref/spec#Interface_types
illustration of slice and map: https://play.golang.org/p/K1gaQ1zhHg
definition of nested types (types)
is this structs? https://golang.org/ref/spec#Struct_types
lists of integer constants (enums)
http://stackoverflow.com/questions/14426366/what-is-an-idiomatic-way-of-representing-enums-in-go#14426447
https://github.com/virtao/GoTypeBytes/blob/master/typeBytes.go That link also has these enums defined:
const (
INT_SIZE int = int(unsafe.Sizeof(0))
UINT_SIZE int = int(unsafe.Sizeof(uint(0)))
FLOAT32_SIZE int = 4
FLOAT64_SIZE int = 8
DEFAULT_ENDIAN int = 0
BIG_ENDIAN int = 1
LITTLE_ENDIAN int = 2
)
KS type:
This list is also useful, but the primary question was how to present KS concept of "type". In the majority of target languages, it corresponds to a class. seq
attributes usually correspond to private class members + getter / setter methods or properties. instances
are more or less the same, but they are lazy (and thus have some sort of "not-yet-parsed" state by default) and they usually require more than one line of code in a getter, given that they usually have some sort of position, sizes, etc.
Reading a seq
is usually implemented in a _read
method, writing of seq
is implemented in _write
. In normal, read-only parser mode, class constructor gets KaitaiStream and automatically calls _read
to do sequence parsing from that given stream.
How all that maps to Go concepts and practices?
a number of fields parsed randomly or calculated (instances)
Not sure what you want here.
Um, please don't take offense, but may be you'd want to get slightly more familiar with KS concepts, such as seq, instances, enums, nested types, etc?
KS type seems to follow: https://golang.org/ref/spec#Types
There are usually no getter setter methods. They are treated as a variable but can also be given methods:
https://gobyexample.com/methods
gives an example of this but with a more complex type, a struct. Yet it is still a 'type' you can see in it's declaration.
Give me a bit longer, I am new to KS and I need to read more. Thanks.
They are treated as a variable but can also be given methods:
Yeah, that should be useful as well. We could use that for "parse instances" (which define some out-of-sequence data located at some arbitrary offset in the stream) and "value instances" (which are basically values, calculated using other available attributes).
Give me a bit longer, I am new to KS and I need to read more. Thanks.
No rush, take your time :) Also, you might be interested in announcing this work on support of Go in KS in some Go communities. I've heard that they are very friendly and open to new things — so, chances are, we'll get more people interested in this support contributing ideas. N heads is better than one, anyway :)
I'll post something up on the reddit :) Thanks!
I saw this on Reddit and am suddenly interested in Kaitai.
I am familiar with Go, but not with Kaitai or scala. I have read through the documentation about adding support for a new language and believe it should be pretty straightforward to get started with a prototype for Go.
Here are my thoughts on how the representation should be in Go (also a basic introduction to some Go concepts):
seq
should match to a "Go slice" of a particular type. Go syntax for a slice of uint32's is []uint32
. Slices can be extended to at runtime with the append()
builtin. For example: arr = append(arr, 5)
// MyType doc string
type MyName struct {
Field0 uint32 // Field Doc string
...
}
type ParentType_NestedType struct {...}
Marshal()
and Unmarshal()
. Those methods should return the Go error
type and operate on byte slices: []byte
.bytes.Reader
type. I believe the primary thing that would need to be added is reading of bits within a byte, but that should be straightforward.nil
interface{}
, but it still might be preferred to have a series of pointers (depends on the use case).value
type
type IpProtocol uint8
const ( Icmp IpProtocol = 1 Tcp IpProtocol = 6 Udp IpProtocol = 17 )
- Go is namespaced by package but otherwise the names for Kaitai types, enumerations, and enumeration values could all collide. If that is allowed in Kaitai (likely), and collisions are a concern then the generated names might have to be Prefixed_ or something. Protobuf handles that by prefixing enum values with the enum type name: `IpProtocol_Tcp IpProtocol = 6`.
I tried prototyping the interface with an `Unmarshal()` method, but I think that should return the number of bytes consumed from the stream. Here is how the standard library does in with the asn1 encoding but that is also a bit awkward: https://golang.org/pkg/encoding/asn1/#Unmarshal
The API for the protobuf encoding and decoding might be interesting (it's also a little bit confusing when you first look at it because of the way the decoding references autogenerated metadata fields): https://godoc.org/github.com/golang/protobuf/proto
Here is an example of how I might generate code:
https://play.golang.org/p/fNbIPlOcpQ
The top level struct (`ExampleFile`) is still a little bit awkward and could use some more thinking.
What are the next steps? Explicitly following the "new language" document? @GreyCat Are you available to write the scala code generation if a reasonable template can be provided? I can work more on the Streaming API.
Here is what the streaming API might look like in Go: https://play.golang.org/p/O1vKDZroZ1
The io.ReadSeeker could be either a an os.File or a bytes.Reader
@tarm Thanks for your interest!
names should be CamelCase
Judging by the examples, probably you mean that type names, sequence attribute names and enums-as-consts should be UpperCamelCase. What about instances?
The protobuf compiler handles this by prefixing the nested type with the parent type name and I think that's what the Kaitai compiler should do as well.
Ok, not a problem.
The typical way to handle binary data for a particular type is to add methods to that type and they should be called Marshal() and Unmarshal(). Those methods should return the Go error type and operate on byte slices: []byte.
I believe that goes well with seq
, but what about instances
? Is there any way to provide lazy parsing support of arbitrary data in current and/or other streams?
Here is an example of how I might generate code: https://play.golang.org/p/fNbIPlOcpQ
Thanks, it makes much more sense now! I believe that stuff like that:
nn, err := e.TrackTitle.Unmarshal(b[n:])
n += nn
if err != nil {
return n, err
}
can be incapsulated into some sort of KaitaiStream-like library?
What are the next steps? Explicitly following the "new language" document?
We need:
Then I add new target -t go
to the compiler, and we try to get it to generate exactly the HelloWorld.go
code that you've provided based on hello_world.ksy
. As soon as we'll succeed there, we'll start CI testing. Then just keep on porting more test specs to Go and fixing the compiler gradually more and more to make it work.
@GreyCat Are you available to write the scala code generation if a reasonable template can be provided? I can work more on the Streaming API.
Sure, I'm totally up to it ;)
Here is what the streaming API might look like in Go: https://play.golang.org/p/O1vKDZroZ1
Cool! I'd like to start a Go runtime repository, but we need to choose name that Go will be referred to everywhere in KS. I guess there are two alternatives - shall we use go
or golang
- which one is better?
The io.ReadSeeker could be either a an os.File or a bytes.Reader
By the way, what about reading performance when using os.File
- would it buffer read calls, so you don't end up with 1000s of one-byte read syscalls?
Instances could also be represented as methods on the top level type. (I'm not familiar with how common they are and what their uses are).
For naming, I think go
is preferred.
Is the generated code typically standalone (like the Unmarshal example I gave) or does it typically use metadata/reflection and runtime assistance from the Stream object? Obviously either is possible, but one is probably easier from the existing code generation structure. If reflection is typical, then it might be possible to make something pretty clean looking by putting all the metadata into field attributes that Go calls struct tags
. That looks like this https://play.golang.org/p/mtGr8J_szY
os.File
is not buffered. It is normally to wrap the io.File
in a bufio.Reader
though if you want to buffer. Unfortunately bufio.Reader does not have a Seek() method, so there may be a little bit more fiddling to do.
OK let me work more on hello world.
BTW, could you point me to what example generated code looks like? Hello world python for example. I see the unit test in the spec directory, but haven't seen generated code.
I've created https://github.com/kaitai-io/kaitai_struct_go_runtime — feel free to start runtime project there. Would you be ok with MIT license, as most our other runtimes use?
Is the generated code typically standalone (like the Unmarshal example I gave) or does it typically use metadata/reflection and runtime assistance from the Stream object?
If I understand properly, generally, there is no metadata/reflection usage inside generated code to maximize performance / efficiency. There is no "parsing state" (except for the state that Stream object provides), we try to minimize various conditions / checks.
If reflection is typical, then it might be possible to make something pretty clean looking by putting all the metadata into field attributes that Go calls
struct tags
.
That's an interesting feature, but that looks more like "an interpreter of provided metadata, encoded in tags", not "compiler". Generally, "compiler" approach is more efficient.
BTW, could you point me to what example generated code looks like? Hello world python for example. I see the unit test in the spec directory, but haven't seen generated code.
You can use ksc
to compile it yourself, or even can use http://kaitai.io/repl/ to check out generation results. For Python, for example, it generates:
# This is a generated file! Please edit source .ksy file and use kaitai-struct-compiler to rebuild
import array
import struct
import zlib
from enum import Enum
from pkg_resources import parse_version
from kaitaistruct import __version__ as ks_version, KaitaiStruct, KaitaiStream, BytesIO
if parse_version(ks_version) < parse_version('0.7'):
raise Exception("Incompatible Kaitai Struct Python API: 0.7 or later is required, but you have %s" % (ks_version))
class HelloWorld(KaitaiStruct):
def __init__(self, _io, _parent=None, _root=None):
self._io = _io
self._parent = _parent
self._root = _root if _root else self
self.one = self._io.read_u1()
@tarm The following issues contain answers on some questions:
Thanks for the pointers! I've made a little progress and would like to share and get feedback:
Runtime stream library: https://github.com/tarm/kaitai_struct_go_runtime/blob/master/kaitai/stream.go
Hello world manual compile and a unit test for it: https://github.com/tarm/kaitai_struct_tests/tree/master/spec/go It looks like the other unit tests are derived from the ruby tests? I am planning to continue with a couple more manual conversions and unit tests, but let me know if those should be autogenerated.
I do not know what the "Kaitai Struct" base type/interface would need to be and would like to explore some examples to get a feeling.
I think next steps for me would be to manually convert a few more .ksy files that have more features. Any suggested progression from Hello World? Perhaps something with a parent reference and something with instances?
Are there plans to have an interface to write files from structures? Naively it seems like all the operations could just be done in reverse.
Ok, first I'm gonna answer questions on runtime API:
func (k *Stream) EOF() bool { // Not sure about this one. In Go, an io.EOF is returned as // an error from a Read() when the EOF is reached. EOF // handling can then be done like this: ...
Generally, there are two approaches to EOF handling in this world:
Given that we inevitably deal with languages that have both schemes, we had to choose a single common pattern here, and we've chosen the first one. Languages like C++ and Python, that rely on the second pattern, will thus have slightly more complex implementations - like this one in C++ or this one in Python. Probably we'll have to implement this with Go as well.
// Go's string type can contain any bytes. The Go
range
operator // assumes that the encoding is UTF-8 and some standard Go libraries // also would like UTF-8. For now we'll leave any advanced // conversions up to the user.
What's the difference between a string
a []byte
then? If there's some standard to use UTF-8, probably we'll just need to convert everything to UTF-8. It's not like we have many things that depend on exact string behavior right now, but generally it's expected that length
method called on a string will return a number of characters, not bytes, reverse
will reverse characters correctly, etc.
// FIXME what does this method do? func (k *Stream) ReadBitsArray(n uint) error {
Generally, it is supposed to read a bitset / bit array data type with given number of bits. See #112 for more details, it's right now in planning stage. Probably we'll use BitSet for Java, BitArray for .NET, std::bitset for C++, etc.
func (k Stream) ReadStrEOS(encoding string) (string, error) { func (k Stream) ReadStrByteLimit(limit int, encoding string) (string, error) {
I see that you've implemented older v0.6 API here — probably because I've referred you to some outdated documentation, and I'm truly sorry for that. Modern 0.7 API does not have distinct string reading methods, instead it uses byte array reading methods (read_bytes
, read_bytes_full
, read_bytes_term
) + convert byte array to string method (bytes_to_str
). Please take a look at any existing runtimes to see what's going on with byte arrays and strings there in meantime, and I'll try to bring all API specs up to date soon.
// FIXME what is group_size ? func ProcessRotateLeft(data []byte, amount int, group_size int) {
Generally, it is supposed to split given byte array into groups of group_size
bytes, and rotate each of these groups left by a given amount
of bits, wrapping on an edge of the group. Thus, for example, rotating 12 34 ab cd
with amount of 4 and group_size of 2, should bring us 23 41 bc da
:
12 34 |ab cd
00010010 00110100|10101011 11001101
/ /
/ /
/ /
00100011 01000001|10111100 11011010
23 41 |bc da
Using default group_size
of 1 will rotate everything within a single byte, thus yielding:
12 34 |ab cd
00010010 00110100|10101011 11001101
/ /
/ /
/ /
00100001 01000011|10111010 11011100
21 43 |ba dc
-- strz what is the intention of all the arguments?
Generally, strz
is deprecated in favor of read_bytes_term
. Here's more or less typical implementation for languages that lack native "read until terminator" functionality. Arguments match .ksy keys closely, that is:
-- do the processing functions operate on the buffers in-place? I assumed so.
Normally, they operate at copies and return them, but it might be customized for particular target language.
In the abstract, I don't fully understand a seekable stream API. If it's seekable, then you must be able to rewind which means it's not a stream unless you've buffered it. If you're going to have to buffer it, then you might as well buffer the whole stream before processing. Go has good interfaces for streaming and good interfaces for seeking on buffered data, but not both at the same time.
It's clearly a terminology question ;) I see that Go's understand of "stream" term implies that it's something that is not seekable and you may be able only to rewind it with some extra pain (doing that buffering). In other languages (and more or less in KS), "stream" is used as a term that just describes an abstraction over generic reading / writing API that deals with files and in-memory byte arrays. Thus it's perfectly fine to have a "seekable stream": for example, Ruby's IO or Python's io work in that paradigm.
Is Seek() only needed for instances with positions?
Yes, it's more or less so.
Can you point to a .ksy that needs a Seek() method?
Sure, here's probably the minimal one: https://github.com/kaitai-io/kaitai_struct_tests/blob/master/formats/instance_std.ksy
For example, in Python it generates:
@property
def header(self):
if hasattr(self, '_m_header'):
return self._m_header if hasattr(self, '_m_header') else None
_pos = self._io.pos()
self._io.seek(2)
self._m_header = (self._io.read_bytes(5)).decode(u"ASCII")
self._io.seek(_pos)
return self._m_header if hasattr(self, '_m_header') else None
I'm tempted to defer the Seek() implementation for a little while until I can explore how the generated code for those should look. Maybe there could be 2 APIs, streaming and seeking, and only .ksy's the need seeking would consume the seeking interface?
That's up to you to decide as Go expert ;) The general requirement is that parsing many real-life file formats (and even some network packets, like DNS) requires seeking. If you feel like providing several implementations (one that to be used when seeking is required and one that can be used for simpler types with no seeking) — sure, why not :)
The only language that might have benefited from such approach so far was probably Java (it has unseekable stuff like FileInputStream
). I've discussed that with like a dozen of Java experts, and we've more or less came to a conclusion that it's generally unpractical to support it.
Happy to read along :) :+1:
@tarm I've started GoCompiler, and went over your rendition of hello_world.go.
I have a few questions:
Unmarshall
needs mandatory stream argument s *kaitai.Stream
. Normally, default stream is expected to be stored in a class, so it can be reused by instances. Instance attributes are normally supposed to be called exactly the same as sequence attributes, i.e. whether you have:seq:
- id: foo1
type: u1
instances:
foo2:
pos: 1234
type: u1
You can access both foo1
and foo2
the same way. In Java, for example, it's r.foo1()
and r.foo2()
, in Python, it's r.foo1
and r.foo2
, etc.
So, shall we add some field (traditionally we call it _io
, but Go would probably have slightly different idea on identifier naming) into a type, and store s *kaitai.Stream
argument into that field in Unmarshall
?
Normally, we allow to supply root
and parent
arguments in a class constructor. Given that Unmarshall
is effectively a constructor, may be we should add root
and parent
arguments there as well, and store them in a type?
Is this anything you would want some help with? I stumbled across this project on a whim after having implemented decoders for a bunch of legacy game file formats in Go. I would be very interested in avoiding future tedium by defining future data spec in Kaitai, and letting a code generated codex do the rest of my work.
Is it possible to author a per-language emitter as a plugin from IDL in a native language à la Protocol Buffers?
@matttproud Please follow the discussion in this issue. I've asked some questions and I'm still waiting for Go expert opinions on them. It's not like I can move this whole thing alone. Unfortunately, it looks like that we've lost original proposers and experts @5k3105 and @tarm :(
Is it possible to author a per-language emitter as a plugin from IDL in a native language à la Protocol Buffers?
If I understood you correctly, this is already like that. All we need to write to support a certain target language is ThatLanguageNameCompiler
class (which is mostly a template that gets filled with values to generate KS output) and ThatLanguageNameTranslator
class (which is also more or less the template, but used for expression language).
Usually Unmarshal should be something like:
func Unmarshal(data []byte, ks *KaitaiStruct) error {
…
}
The streaming api could be something like:
byteStream := []byte{0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff}
reader := kaitai.NewReader(byteStream)
dec := kaitai.NewDecoder(reader)
kaitaiRootStruct := FooBar{}
err := dec.Decode(&kaitaiRootStruct)
if err != nil {
log.Fatal(err)
}
Similar in json
That way the decoder handles the stream and the relations (parent, root) etc. This Go way pretty much contradicts the Kaitai way of storing the stream and relations in every object.
To just add relations (IO, parent, root) to @tarm 's hello world see https://github.com/cugu/kaitai_struct_go/blob/master/hello_world.go and https://github.com/cugu/kaitai_struct_go/blob/master/struct.go
@cugu My main question is whether Unmarshall
is strictly required, or it is just a convention, and how heavily it is enforced / frowned upon if we'll do it slightly differently? As far as I understand, it's really just a static method that fills in a pre-allocated structure.
The problem is that KS generally is not that simple, and you don't have a luxury of parsing anything in one function and just going in with approach "I'll call this one can and after that I'm guaranteed to have all the fields of structure X filles in".
There is lazy parsing, and it is especially important for "parse instances": it is used to parse multi-terabyte file systems without loading everything into memory at once. Normally, you just generate something like that in languages like Java:
public IndexEntry index() {
if (this.index != null)
return this.index;
long _pos = this._io.pos();
this._io.seek(indexOffset());
this.index = new IndexEntry(this._io, this, _root);
this._io.seek(_pos);
return this.index;
}
Note that it makes heavy use of this._io
, i.e. object remembers a reference to a stream from which data should be acquired.
I see that you've proposed to add Metadata
member and type to store that information. It will probably be ok with IO (although probably the correct naming is _Metadata
in this case, so it won't clash with any existing member named Metadata
). How would it handle parent relationships, as usually parent data type is not just "Struct", but inferred in ksy compile time to be something more specific, i.e., for example, how would you compile nav_parent.ksy.
@GreyCat Thanks for all the background and suggestions. It's very helpful. Sorry for being MIA, $DAYJOB has kept me busy. I have been thinking about it though!
Sorry, I have not yet updated the stream api with your comments, but the comments and changes to the stream API seem straightforward.
Go does not have python's __attr__
so the way to make seq
consistent with instance
is to either have getter accessor methods everywhere or have all fields be flat.
As @cugu mentioned, there is definitely a preference in Go to have data types be flat and add functionality using methods. That is different than the standard KS way. Obviously that does not work for terabyte filesystem parsing.
However, if you have to parse a terabyte filesystem, you could still use the auto generated code but only call the "inner" unmarshal functions and manually write code to jump or iterate through as needed for your business logic.
As a Go user, my preference would be to have the Unmarshal() method and not embed the KS struct into each object.
I've been thinking more about the parent and root references as well. (This suggestions presumes unmarshal). You could pass in the needed leaf fields into Unmarshal() as extra arguments. That means more bookkeeping for the code generator because all the intermediate Unmarshal functions would also need those arguments, but would be easy to understand and satisfy the type checker. I just pushed an example for nav_parent: https://github.com/tarm/kaitai_struct_tests/blob/master/spec/go/nav_parent.go
Obviously that is also different from the standard KS way as well. Go is statically typed enough to make the parent references a little painful otherwise. Doing it the more typically KS way would be easy with root, but trickier with parent references if a type could have parents of different types. How is that handled in the generated code for other statically typed languages? Is the .ksy format statically typed enough for that to not be a problem?
Finally, in nav_parent, I had to add an int()
conversion to go from a uint32
field to a length in ReadBytes(). Should ReadBytes take uint64 as an argument? Does the KS compiler add the appropriate conversions for references like that? Does the KS compiler add overflow checking before conversions?
Go does not have python's
__attr__
so the way to make seq consistent with instance is to either have getter accessor methods everywhere or have all fields be flat.
Ok, I start to understand that Go is a pretty low-level language, as I go ;) It's not terribly hard to do these distinctions automatically for generated code (i.e. foo.bar.baz
being translated to Foo.Bar.Baz
or Foo.Bar().Baz
or Foo().Bar.Baz()
or whatever's needed), but then again, it's not super user-friendly, I guess. Given that getter methods add no overhead, I guess we could probably generate them just in case (or at least start with that approach and tune it afterwards with CLI switches).
As a Go user, my preference would be to have the Unmarshal() method and not embed the KS struct into each object.
It is possible, but it would basically require end-user to keep references to root, parent and io somewhere, and pass it along to every instance attribute call, i.e. Foo.Bar(io, parent, root).Baz(io, parent, root)
, etc. Also, technically it's possible to implement both styles of code generation and use some CLI switch to choose between them. If are not 100% opposed to it, I'd start with more "traditional" object-oriented approach, i.e. storing these references in generated structs.
How is that handled in the generated code for other statically typed languages? Is the .ksy format statically typed enough for that to not be a problem?
Parent type is inferred in pre-compile-time in ksc, and is known since then. For example, in Java, nav_parent.ksy is compiled to:
public class NavParent extends KaitaiStruct {
public NavParent(KaitaiStream _io, KaitaiStruct _parent, NavParent _root) { /* ... */ }
public static class HeaderObj extends KaitaiStruct {
public HeaderObj(KaitaiStream _io, NavParent _parent, NavParent _root) { /* ... */ }
}
public static class IndexObj extends KaitaiStruct {
public IndexObj(KaitaiStream _io, NavParent _parent, NavParent _root) { /* ... */ }
}
public static class Entry extends KaitaiStruct {
public Entry(KaitaiStream _io, IndexObj _parent, NavParent _root) { /* ... */ }
}
}
Note that both _parent
and _root
types vary in all these classes, and in most cases it is not KaitaiStruct, but some more specific class that inherits from KaitaiStruct.
Finally, in nav_parent, I had to add an
int()
conversion to go from auint32
field to a length in ReadBytes(). Should ReadBytes take uint64 as an argument? Does the KS compiler add the appropriate conversions for references like that? Does the KS compiler add overflow checking before conversions?
By design decision, KS expression language assumes that there's only a single one-size-fits-all integer type in a language (which is not that far from being true in many cases) and doesn't track complex integer type implicit casts, which are very different in various languages.
Implementation of stuff like ReadBytes()
thus can be one of:
readBytes(int n)
+ readBytes(long n)
, etc)int()
)There's no overflow checking everywhere, and this is almost mostly by design. Current approachin vast majority of existing situations and we don't really aim to implement 100% compliant exact emulator of some particular language in all other languages. We just say that it's "undefined behavior" (which is pretty much the standard excuse in C/C++ standard, so we're not alone here either).
It is possible, but it would basically require end-user to keep references to root, parent and io somewhere, and pass it along to every instance attribute call, i.e.
Foo.Bar(io, parent, root).Baz(io, parent, root)
, etc.
Off topic but if I recall correctly, this is the major choke point I had with Rust support in #22 (possibly for a slightly different reason than with Go). Storing the parent/root references was very hard because of Rust's "you can only have one mutable reference, or multiple immutable references, but never both at once" philosophy. If we could go with this end-user-controlled approach, I believe that I could continue working on Rust support. There'd still be some pretty hard limitations when dealing with instances but it'd be better than nothing at all. Do you think we could make this method an alternate option for languages where the "traditional" KS way doesn't fit into the target language very well?
@LogicAndTrick Probably it's possible, but note that this could quickly get quite messy. Even if you have only a single stream, and straightforward chain of parenting, in real life with foo.bar.baz
you'll end up with something like
this.foo(io, this, this).bar(
io, this.foo(io, this, this), this
).baz(
io, this.foo(io, this, this).bar(
io, this.foo(io, this, this), this
), this
)
Pardon my Java-esque pseudo-code, hope it's understandable. If you're creating substreams, or have something less trivial for parenting chain, it would quickly get much uglier.
Hmm, yeah. It would kinda only work for very simple cases. The compiler would probably get much more complex if it had to manually track parent/root references everywhere...
Compiler probably won't be much more complex (it already does that tracking), but generated code probably would be, and, if I'm not mistaken, recalculating all that stuff every time is not exactly performance-savvy as well.
Hm, it probably won't work anyway, because you can't really do stuff like _parent._parent
when each object doesn't have a reference to its own parent... You can't really expect the caller (or compiler?) to pass in every parent of an object that might be 10 levels deep in the hierarchy :/
My example of the nav_parent:
Some notes:
Thanks for everyone's input. Looks like I've managed to hack up a very early compiler that compiles hello_world.ksy and run-go
script in the tests repo that runs the tests:
ksc
is supposed to be invoked with --go-package test_formats
— note that compiler creates src/$GO_PACKAGE_NAME/$TOP_LEVEL_TYPE.go
file, i.e. path with the directory structure.
I'll proceed with getting nav_parent
to work as well.
Ok, I've done some experiments with Go, trying to get either nav_parent
or floating_points
to work, and here's what I've came up with so far.
This is actually pretty bad, as everything we've done previously so far fit "everything is expression" paradigm. In Go, given the lack of exceptions, basically every step of expression calculation needs to be checked and extra if-return
statements added. For example, something very crude like foo.bar(123).baz(456)
, becomes big in Go:
if tmp1, err := foo.bar(123); err != nil {
return err
}
if tmp2, err := tmp1.baz(456); err != nil {
return err
}
// and now we must proceed with tmp2
This is not only about expressions language, this also affects regular parsing statements. While is all other languages we clearly separate "parsing as expression" and "assignment" steps, i.e. for simple assignments we do:
this.foo = $X;
where $X can be as simple as:
_io.readU1()
or complex, such as
new String(this._io.readBytes(16), Charset.forName("UTF-8"))
And we can reuse exactly that expression for other situations like adding an element to array:
this.foo.add($X);
It turns out that all the parsing statements would be much more complex in Go. For example, in Java:
// Parsing + assignment of a single field
this.foo = this._io.readU1();
// Parsing + assignment of an array element
this.foo.add(this._io.readU1());
// Parsing of a string
this.foo = new String(this._io.readBytes(16), Charset.forName("cp437"));
// Parsing of a string array element
this.foo.add(new String(this._io.readBytes(16), Charset.forName("cp437")));
In Go, it turns out :
// Parsing + assignment of a single field
if this.foo, err = this._io.ReadU1(); err != nil {
return err
}
// Parsing + assignment of an array element
if this.foo[i], err = this._io.ReadU1(); err != nil {
return err
}
// Parsing of a string
tmp1, err := this._io.ReadBytes(16)
if err != nil {
return err
}
this.foo, err := some_way_to_convert_byte_array_to_utf8_string(buf, "cp437")
if err != nil {
return err
}
// Parsing of a string array element
tmp1, err := this._io.ReadBytes(16)
if err != nil {
return err
}
this.foo[i], err := some_way_to_convert_byte_array_to_utf8_string(buf, "cp437")
if err != nil {
return err
}
As far as I see, user structure can be declared either as foo UserType
or foo *UserType
. In C/C++, this roughly equivalents to allocation on stack and on heap, in Go things seem to be somewhat more tricky. In KS-generated code in C++, we use heap-allocated structures exclusively, due to the fact that:
if
being false, for example), we don't want it to waste memory;As far as I understand, doing stuff like
type MyStruct struct {
foo Substruct
}
effectively preallocates structure foo
inside MyStruct
, which is bad due to reasons stated above (or, at the very least, it does not fit all the use cases). "Reading" of such substructure is then a matter of calling Read
method on it (note, however, that this is yet another variation of parsing syntax that doesn't fit "everything is just expression" paradigm):
if err = this.foo._Read(this._io, this._parent, this._root); err != nil {
return err
}
However, if we do:
type MyStruct struct {
foo *Substruct
}
then we need to construct foo
as well, this is yet another separate statement, and yet another syntax:
this.foo = new(MyStruct)
if err = this.foo._Read(this._io, this._parent, this._root); err != nil {
return err
}
I don't quite get it yet how construction of arrays work in Go as well. @cugu provided this in his example:
type IndexObj struct {
entries []Entry
}
// ...
o.entries = make([]Entry, header.QtyEntries())
I guess it preallocates not only space for pointers, but also allocates space for elements itselves as well. What about repeat-until
or other repetitions with no known limit beforehand (so they need to grow)? Shouldn't we use []*Entry
(is that even possible?), and allocate space for structures using new()
manually?
More thoughts: as instances for non-trivial calculations that might throw an exception this means that methods that implement instance must return 2 values: actual result and err. This again makes it pretty hard to unify with sequence attributes getters, which are actually known to return no error. The choices are:
(result, err)
. This would make sequence attributes access a real pain, i.e.:func (this *FloatingPoints) SingleValueBe() (r float32, err error) {
return this.singleValueBe, nil
}
type FloatingPoints struct {
SingleValueBe float32
}
This leads to distinction between accessing seq and instance attributes, i.e. foo.Bar
vs foo.Bar()
+ do error checks.
Ideas, comments?
@GreyCat Can we just exit
in case of error? Something like this:
package main
import (
"errors"
"fmt"
"os"
)
func (foo *Foo) Bar(v int64) *Foo {
fmt.Printf("Bar() called with %d\n", v)
// Do something if error, exit with error code
/*
err := errors.New("Something was wrong")
if nil != err {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
os.Exit(1)
}
*/
return foo
}
func (foo *Foo) Baz(v int64) {
fmt.Printf("Baz() called with %d\n", v)
}
type Foo struct {
}
func main() {
foo := Foo{}
foo.Bar(123).Baz(456)
}
I believe it's considered a bad practice in Go. And, even if we'd opted to exit / panic on every error, this won't help a lot anyway: we have to process lots of existing library calls that do return error as second return value anyway, so it doesn't really matter if we do if
check + do os.Exit
or we do if
check + do return
. It is still very different from all other languages, and looks pretty close to proper C.
IMHO os.Exit in a 3rd-party library is a very bad idea generally.
Meanwhile: sorry I did not read through the whole thread, but would that be a good solution if KaitaiStream use panic instead of errors and only the outermost class would recover it?
It's a bit hacky, but it's something similar to exceptions in other languages and could make the code much simpler (although I am not sure I understand fully how it works).
Apologies for butting in to the thread. I have been following it for a long time. Seems like you guys are doing some great progress. Just wanted to comment a bit on this -
In Go, given the lack of exceptions, basically every step of expression calculation needs to be checked and extra if-return statements added
and
It turns out that all the parsing statements would be much more complex in Go.
Well, not necessarily. Go author Rob Pike has wrote an article which handles this case quite elegantly. Not sure, if you have read it or not, but I think his article fits wonderfully to what we are trying to achieve here.
Here are some links which I found useful related to panic
and error handling:
Actually, let's talk about expressions in Go in general. For example, consider we're compiling a value instance like that:
meta:
id: example
seq:
- id: buf
type: u4le
repeat: eos
instances:
foo:
value: (buf[1].to_s + "123").to_i
This ksy normally reads everything it can into array of 4-byte unsigned integers named buf
. foo
instance is calculated as following:
buf[1]
takes second element of buf
array (note that array can have less than 2 elements — possibly an exception would be thrown)buf[1].to_s
converts integer into a decimal string representationbuf[1].to_s + "123"
adds string "123" to it(buf[1].to_s + "123").to_i
converts resulting string back into integer, interpreting it as decimal number; again, several errors can be happen at this step:
int
, long
, or whatever is the limitFor reference, this expression is compiled as following in Java:
Long.parseLong(Long.toString(buf().get((int) 1), 10) + "123", 10)
in Python:
int(str(self.buf[1]) + u"123")
in Ruby:
(buf[1].to_s(10) + "123").to_i
or in C++:
std::stoi(kaitai::kstream::to_string(buf()->at(1)) + std::string("123"))
As far as I understand, proper by-statement Go rendition of this expression would be:
tmp1 := buf[1]
// no check needed, because accessing non-existent slice members is panic
tmp2 := strconv.Itoa(tmp1)
tmp3 := tmp2 + "123"
tmp4, err := strconv.Atoi(tmp3)
if (err != nil) {
return 0, err
// note that we need to understand resulting type to generate
// valid "no answer" first return argument; just returning "nil, err"
// is not valid :(
}
foo = tmp4
return foo, nil
Of course, when no checks are needed (i.e. when step is know to be non-failing or it uses panic to fail), these parts can be merged into something resembling at expression, i.e.:
tmp4, err := strconv.Atoi(strconv.Itoa(buf[1]) + "123")
if (err != nil) {
return 0, err
}
foo = tmp4
return foo, nil
Probably even tmp4
and foo
can be merged, but that's another optimization. If we'll be doing this, we'll start from "full" statement-by-statement form, I guess, and try to reduce it later.
All this stuff really reminds me of C, which is technically good, as it brings us closer to C implementation as well...
Meanwhile: sorry I did not read through the whole thread, but would that be a good solution if KaitaiStream use panic instead of errors and only the outermost class would recover it?
It will only help in one area, i.e. it will make instance attributes vs seq attributes getters' signatures more alike, but it won't help at all with all the existing library functions that work with returning error codes instead of using panic. We'll still have to break expressions as I've illustrated step-by-step and insert these if-return
(or if-panic
) on every occasion.
@agnivade
Well, not necessarily. Go author Rob Pike has wrote an article which handles this case quite elegantly.
As far as I understand from reading the article, it basically promotes "store error state somewhere in object and check in the very end of process" semantics. I don't really get how it would help with existing stdlib functions, which are already implemented in "return value-error pair" paradigm.
@GreyCat - Existing stdlib functions will remain the same. Its only useful when you call those functions repeatedly. Then you can wrap the error handling and state in a struct and call your functions. Maybe its not applicable here, but something to keep in mind, just in case.
If we'll be doing this, we'll start from "full" statement-by-statement form, I guess, and try to reduce it later.
:+1:
Ok, 3 days and several dozens of kilobytes code later, I've done a major revamping of translator system and GoCompiler. Now hello_world and floating_points compile and test successfully. I guess everyone's interest in Go target faded away (as nobody still answered most of my questions), but I'll try to highliught some of the newly introduced things anyway ;)
h.Foo
— i.e. raw direct field accessfunc (this *FloatingPoints) DoubleValuePlusFloat() (v float64, err error) {
if (this._f_doubleValuePlusFloat) {
return this.doubleValuePlusFloat, nil
}
this.doubleValuePlusFloat = float64((this.DoubleValue + 0.05))
this._f_doubleValuePlusFloat = true
return this.doubleValuePlusFloat, nil
}
This is trivial value calculating instance, without any early bailouts and error checks. For compatibility, all instances follow the same pattern and return (value, err)
, even those who have no way to return error even in theory.
singleValuePlusFloat, err := h.SingleValuePlusFloat()
if err != nil {
t.Fatal(err)
}
assert.InDelta(t, 1.0, singleValuePlusFloat, delta)
I'll try to take on some of other tests as well now.
Done str_encodings_test as well, instances support is on the way.
This one requires BytesToStr(...)
runtime library function, so I've added it:
func BytesToStr(in []byte, decoder *encoding.Decoder) (out string, err error) {
i := bytes.NewReader(in)
o := transform.NewReader(i, decoder)
d, e := ioutil.ReadAll(o)
if e != nil {
return "", e
}
return string(d), nil
}
@tarm Could you please confirm that you're ok to publish your runtime under MIT license, so I can put it into "proper" kaitai-io location, add READMEs and LICENSE, and continue working on that in public?
Ok, last one for today: I've completed position_abs_test as well.
It required ReadBytesTerm(...)
runtime library function, a very crude one is like that:
func (k *Stream) ReadBytesTerm(term byte, includeTerm, constumeTerm, eosError bool) ([]byte, error) {
r := bufio.NewReader(k)
return r.ReadSlice(term)
}
Right now it fails with:
=== RUN TestPositionAbs
--- FAIL: TestPositionAbs (0.00s)
Error Trace: position_abs_test.go:32
Error: Not equal:
expected: "foo"
received: "foo\x00"
which is totally understandable.
This one actually is a pretty complex one, as it demonstrates parse instances fully working and bytes-to-string encoding process.
Ok, now we've got 4 tests, so I'd propose to stop further development of ksc GoCompiler for now, review everything by those who are interested in Go target and solve some infrastructure / system issues, i.e.:
I think he means wrapping the stdlib functions to handle errors like this. How many functions from go libs will we be using? Say there are 20, these would all be wrapped to handle errors in an 'error log' way. Then you would call only the wrapped functions insted of stdlib directly.
Error handling is a sore point with many in the go community.
Sorry I have not been around because I have been overtime @ work for the past 2 months. Still would like to see this completed - but I think those who've responded here are probably more experienced with go than me :)
I believe the appropriate lower level libraries to use will be: https://golang.org/pkg/bytes/ https://golang.org/pkg/os/
general outline of a go program. Taken from a sprite editor I was making for it's 'low levelness'. github.com/virtao/GoTypeBytes has a lot of functions on bytes that seem applicable for this library.