google / flatbuffers

FlatBuffers: Memory Efficient Serialization Library
https://flatbuffers.dev/
Apache License 2.0
23.11k stars 3.23k forks source link

Kotlin partial support #354

Closed Lakedaemon closed 5 years ago

Lakedaemon commented 8 years ago

I have started to implement some support for the Kotlin language.

My preliminary hacky work is there : https://github.com/Lakedaemon/flatbuffers/tree/kotlin

It builds and creates classes with accessors and mutations (val/var or functions). But there are a few things missing (enums..., create functions...) At this point, it isn't ready at all

I wouldn't mind having a bit of help/advice as : I'm mostly a kotlin developer and haven't done C++ work for 5 years I'm a bit lost about the test structure... And some things aren't working like I would expect them too (like the switch statement to get the destination ... cast/masks) I would love to have some insights/feedback as to what I did right and what I did wrong...

Best regards, Olivier

Lakedaemon commented 8 years ago

Ok, I managed to make the switch statements work I am compiling the Monster & cie flatbuffers : a few issues remain...

I basicaly hacked the go code, as go might be ressembling kotlin the most. For more complicated stuff, I'll look at the Java code...

Now starting to work on enums (and next on unions)...

Lakedaemon commented 8 years ago

Mmmh, this is going to be problematic : union Any { Monster, TestSimpleTableWithEnum } // TODO: add more elements

"Any" is a language type in kotlin (it's what corresponds to java's Object)

let's try to use a full name (with package)

Lakedaemon commented 8 years ago

ok... managed to setup a script to compile a test with kotlinc and execute it with the java vm. Now, let's write tests in kotlin (converting those from java) and debug...

Also, got to finish enums/unions...

mmh, another problem with Stat : it defines a val field which is a reserved name in kotlin

Lakedaemon commented 8 years ago

I have reached a new milestone. The test and the classes produced by flatc compile. A lot of tests pass (first fail is in testExtendedBuffer()). I now have to make all tests pass : let the fun begin ! :)

ghost commented 8 years ago

Hi there! Cool you're working on this.

First, it be good to collect your commits into a PR, which makes it easy for others to see what code is new, and to comment on it.

As for C++, I can answer specific questions, but for this to be successfully merged into master, we need language port authors to have a certain level of "confidence" over their understanding of the FlatBuffers internals and implementation to be able to write a correct generator. There are a lot of details to implementing FlatBuffers correctly.

Your generator should probably be part of idl_gen_generic.cpp, since I am assuming Kotlin is somewhat similar to Java an C#.

Testing is something you can make language specific, i.e. there is a script (generate_code.sh) that would now additionally generate Kotlin code, and should be tested as thoroughly as possible. Look at other languages for inspiration.

Lakedaemon commented 8 years ago

Hello there. I'm planning to debug my code, clean it a lot (reusing stuff in idl_gen_generic.cpp now that I have more understanding of what flatc is and does) when it'll be passing the tests, using the google standards, and submit a squashed pr when it's ready for review.

Also, for confidence. I don't have a complete understanding (yet) of the flatbuffers format (I need to have a very long and deep look at how vtables are layed out and about padding) but, I understand what it does and I can look at/reuse the java code so... time will tell... ^^

I'll look at generate_code.sh and see what I can make of it, thanks for the tip (at the moment I'm using a script and a test class, inspired by the java code).

Thanks for the kind answer (and for the flatc code :)

Lakedaemon commented 8 years ago

Now, all tests (converted from JavaTest.java) pass for Kotlin, which just means that we should be able to do the same things in kotlin than in java with flatbuffers and that it kinda works (yet it still isn't ready for production use).

Now, I know for sure that there are a few places where I have messed things up and the tests haven't caught me red-handed which means that there are opportunities to improve the (java/kotlin/other language ?) test coverage. ^^

For example : In some string accessors, I'm returning "" instead of null where the vtable offset is 0 which is wrong -> default values should be tested in case of missing fields (or vector values)

Next, I'm going to work on :

Concerning flatbuffer support, Kotlin behaves mostly like C++, java, C# and Go but has a few peculiar points with : 1) Enums (a bit like C#) 2) Types appear after parameters and functions (a bit like Go) 3) val and var properties instead of functions for field accessors in struct/tables ( a bit like C#) 4) Null safety (type String? is nullable when type String isn't)
5) Companion objects (instead of static methods) 6) Extension functions (a bit like Go)

This makes me think that the kotlin code can reuse quite a bit of what is defined in idl_general... and that it needs a specialized engine in idl_kotlin as most stuff in idl_general has hardcoded types before parameters like this :

code += " public " + struct_def.name; code += " __init(int _i, ByteBuffer _bb) "; code += "{ bb_pos = _i; bb = _bb; return this; }\n\n";

I guess it could be done with stuff like if (lang.language != GeneratorOptions::kKotlin) { .. }

But they are going to be all over the place and with the increasing numbers of supported languages, idl_gen_general is going to become a big mammoth.

So, for the moment, I think I'm going to stay with KotlinCodeGenerator extends/specializes the GeneralBaseCodeGenerator

By the way, is there an grammar for the flatbuffer schema ? (update : yes, there is one in the doc folder)

Lakedaemon commented 8 years ago

Ok, I have tried reusing stuff in idl_gen_general.cpp like the GenTypeForUser methods, which looked like a very good candidate for reuse in generator for target languages BUT as it is declared static, it is not visible outside of idl_gen_general.cpp which means it can't be reused...

As all methods in that file are static, we will have to reimplement them for all target languages except if we somehow integreate the generators in idl_gen_general.cpp, but it will be awkward for Kotlin as : For uChar/uShort..., Kotlin doesn't cast, it widens bits with value.toByte() For enums, Kotlin converts them with a function YourEnum.from(value)

But as we only have the SourceCast method that prepends the value, it won't do for source

For destination, DestinationMask (after the value) and DestinationCast (before the value) would do.... but their name would be misleading for kotlin...

Pondering what to do... (update : removing static in idl_gen_general.cpp fixes the issue and makes the code reusable)

Lakedaemon commented 8 years ago

After struggling today to align my code with idl_gen_general, I'm finally totally confused by the types in flatc...

I was expecting to mainly have : A type for the user facing api (Int for USHORT) A type for the wire/storage (Short for USHORT) (as well as a type for the create methods (offset for pointer types)... but that's beside the point)

BUT GenTypeBasic is neither as Java has : boolean for BOOL -> user facing | not a wire/storage short for USHORT -> not user facing | wire/storage

Are there some inconsistent definitions in flatc or am I misunderstanding something ? Could you enlighten me ?

I could use more documentation (say an explanation of what they are exactly for, and the subtle difference between them) for GenTypeGet, GenTypeBasic, GenTypeNameDest, GenTypeForUser, DestinationType...

Lakedaemon commented 8 years ago

Thinking it through, given theorical types (USHORT), I was expecting to have Type representation in the language (Int) <- conversion -> Type used for storage (Short) <-- read/write--> storage (byteBuffer.putShort/getShort)

Now, in flatc, the "inconsistent?" behaviour above is mitigated/amplified by the fact that : In java, addBoolean has been added to FlatBufferBuilder (so actually Boolean is a wire type for storage) And in the getters and setters, a boolean can be read/written from bytebuffers (through a byte), so the conversion actually happens in the getters/setters (GenGetter, GenSetter)---> wrongly overstepping responsabilities

Also, for each language, you can tamper with things in idl_gen_general by adding switches like if (lang.language == GeneratorOptions::kCSharp) { do something different in there }

conversion between types differ also from language to language. the C/java way is to make it like (int) value and 0xFFFF When the Kotlin way is to write it like value.toInt().and(0xFFFF) (<--- this isn't a cast, it widens the bits of the type)

Instead of having casts (before) and masks (after) to hardcode it like cast + value + mask, it would be nicer to have convertToSource(value) and convertFromSource(value) method Especially since source only has the cast (and not the mask), so there is no way to do Enum.from( value ) with sourceCast + value

Also, what works 0!= value to create booleans doesn't for Kotlin, we need 0.toByte() != value so DefaultValue needs some clanguage customization too

Also, (this is a matter of taste, not a real issue) bb and bb_pos are used throughout flatc, I would rather use _byteBuffer and _position in Table/Struct. It would be nice if there was an languageOption to customize that (to better respect the language customs)

To wrap it up, I would love the code to become more simple and general to avoid code duplication/unnecessary headaches (especially if more complex features are to be implemented on top of the flatbuffer format (like maybee caching, mutating in memory and writing on copy, binarySearch on vectors of Struct/tableswit multi keys...)

On a bright side, the code is brilliant and there are very smart ideas in there (using flatbuffers for reflection representation, awesome :) ).

Lakedaemon commented 8 years ago

I studied those functions in the code and started (documenting them and) reusing them. On the latest commit, the code builds and tests pass (yet, it still isn't ready for production). I have started to write some documentation as well as a guide for others to add support for other languages to flatbuffer.

I still have a lot of clean up to do before it is ready for a Pull Request. Also, before that happens, I'll change the kotlin api to make it more in the spirit of Kotlin (concise & fun to use) :
There are some unnecessary long method names like Monster.createMonster(....) or rootAsBytebuffer(...) that have duplicate info in them or type info that any decent IDE (like intellij idea) will give you along with code completion... (but well, I bet C++ doesn't have that and needs the verbosity, to allow devs to understand what they use)

Lakedaemon commented 8 years ago

I cleaned my code. Next, I'm going to revert the changes I did to idl_gen_general.cpp Most methods in there can't be shared, even the FunctionStart function ! (I had to duplicate them...like the go people) and I don't want to increase the complexity of this file by adding Kotlin specific stuff...

ghost commented 8 years ago

Differences can be resolved in idl_gen_general.cpp, for example, to resolve the type order in declaration, simply make a GenDecl function that takes a type and a variable name, and outputs them in the correct order for all languages, and call that from whereever needed. Double check by running the Java/C# generators that their code hasn't changed.

It's important that we keep idl_gen_general.cpp with as few if-thens as possible (if-then's isolated in functions). It's also very desirable to have languages like Kotlin in there rather than in their own .cpp, as the more different generators we have, the more of a maintenance nightmare it becomes. In fact, I'd love to merge the Go generator back into the general one at some point. Opinions, @rw ?

As for wire types vs others, the types declared in idl.h for Java are the wire types, which is byte in case of bool, etc. Even if the function is called addBoolean, that is just for convenience/clarity, it still will store a byte in the buffer. What Kotlin does in terms of user-facing types is up to it, I presume it has the same lack of unsigned types that Java has?

Which brings me to another point, if Kotlin can call Java code, does it make sense to have it call into the Java runtime types (FlatBufferBuilder etc.?) It be great to reuse that code, greatly reducing the chances of errors in Kotlin.

If you feel there's something inconsistent happening in Java, you can clean that up, as long as the functionality remains the same. Though if it breaks the API, that be better left for a followup commit.

Different syntax for casts: again, make a function that generates a cast, and does something different in Kotlin. Reduce the difference in languages to one single function.

Lakedaemon commented 8 years ago

Answering your points in a different order :

1) Kotlin as Java doesn't have unsigned types.

2) Making Kotlin call the Java code would forsake most of it's benefits over Java. It would be a bit like forcing C++ dev to only write C.

As an illustration, I'm building a monster with the latest version of the Kotlin api like this : with (builder) {with (Monster) { val mon = start() .addPos(with(Vec3) { createVec3(1.0f, 2.0f, 3.0f, 3.0, Color.Green, 5.toShort(), 6.toByte()) }) .addHp(80.toShort()) .addName(str) .addInventory(inv) .addTestType(Example.Any.Monster) .addTest(mon2) .addTest4(test4) .addTestarrayofstring(testArrayOfString) .addTestbool(false) .addTesthashu32Fnv1(Integer.MAX_VALUE + 1L) .end() finishBuffer(mon) } }

and I'm working to improve this api further : I'll remove the "add" prefix, the end() and start() method calls... and hopefully also the "with" parts (this is the difficult bit)

3) If I have to write a GenDecl function that work for ALL languages, and test all code generators. Somehow, I'm already in maintainance hell.

I should only have to worry about the Kotlin code generator. The fact that I have to modify the C++/Java/Csharp code generators to make some room for the Kotlin Code generator is problematic

Of course, for casts and declaration order, it's easy and doable (But what about nullability, fluent apis, companion objects, Extension functions, properties... there are a lot of differences) :

transformSource(value) { if (isKotlin) return beforeSource + value + afterSource else return othersCast + value;
}

But each time a language is added to flatc, it will happen again and again... And later on, someone is bound to change the Kotlin code to make room for another language...

This also explains some inconsistencies in the idl_gen_general.cpp code and the if/then hacks.

IMO, ideally idl_gen_general.cpp should just contain a completely general and abstract class, providing truly general stuff like comments, fields, utility functions...

And each language should have it's own file (separation of concern) implementing a subclass and overriding the methods they need

This is also just what has been done by the Go/python/javascript and Kotlin guys... ^^

Lakedaemon commented 8 years ago

Sorry about the pull request : this is my first time. I did sign the CLA but I forgot to squash the commits into a single commit. I'll try to fix that today when I find time.

Also, I did a git rebase -i (but it had some trouble so I had to retry a few times and do some git edit-todo thingies, so I'm not sure it worked)

I don't know if I can cancel a pull request. I'll see what I can do to do a better job...

Lakedaemon commented 8 years ago

Just implemented a (proof of concept) toString() code generation. It should suck performance-wise as it does a lot of allocation. But it should be quite nice for debugging/test purposes.

Got to think how to make that efficient given that you may have tables of vector of tables of vector of tables... It'll probably involve recursivity, variations of te _offset/_indirect... methods that take a (parent) byteBuffer and an offset (relative to the parent)...

Lakedaemon commented 8 years ago

I have started eating my own dog found which gives me some more coverage for the code generation and I found out that I was wrongly using the code to generate "createStruct" to generate Tables...

So I cancelled my pull request and I'm working to fix this : Mainly by implementing the same methods that for java with the added benefits that, as Kotlin has lambdas, we can probably create inlined structs also...

Kotlin has default values for metod parameters too, and can pass parameters by names in functions. So, I'll revamp the constructor api, to : take required parameters first (-> I got the impression that this requirement is only check at a flatbuffer's creation and doesn't impact code generation for getting values (which is probably wrong)) without default value,so you have to explicitly give them

take optionnal parameters next (with defaults)...

take a lambda last, to write the struct fields inline

I'll try to implement that and see how it goes...

Last but not least... the tests don't fail if you don't implement these create methods ---> improve the tests

ghost commented 8 years ago

You can make fixes to the same PR, rather than pulling it when you find a problem. This will allow people to start reviewing your code.

to 2) above: I didn't suggest making Java the Kotlin API. I suggested just using the Java internals. For example, FlatBufferBuilder.endObject is a complex method that is not part of the user-facing API, so could be called by Kotlin code rather than reimplemented.

Lakedaemon commented 8 years ago

Thanks for the tips. I didn't know. Maybee I can reopen the PR.

I am actually reusing a lot of the Java internal code (the Table/Struct/FlatBufferBuilder classes in com.google.flatbuffers), that I converted it to Kotlin to get Null safety on top of the nice helpers functions. This gave me a very big headstart (I'm really thankful).

I have only slightly altered the api (not the internals) to make it more convenient, more safe or more aligned with what users expect.

For example, thanks to inlined lambdas, we can "hide" the calls of startObject() and endObject(), while making sure they are both called at the right places, with no impact on performance.

Lakedaemon commented 8 years ago

as a follow up to the issue about Table/struct create constructor, thanks to inlinable lambda functions, I think that it might be possible to have constructors in the same spirit than the C++ ones :

On the bad side, users that wouldn't use the standard/built-in way to build a struct would shoot themselves in the foot :/ (at this time, it doesn't look possible to subclass a function in Kotlin and to inline it :/)

Lakedaemon commented 8 years ago

Just pushed a bug fix for some create methods for vector of scalars (with vararg) : the destination types were wrong (it's really hard to understand what types you get when you call the methods that originates from idl_general that use like 3 or 4 composed method calls... I think that I'll simplify and document those, next time, because it is crazy stuff that makes fixing the most simple bugs a gamble)

Also, strangely, in the latest Kotlin 1.0.0 RC, I got some error because of the one assert in FlatbufferBuilder (looks like assert might have been disabled by the kotlin team...which is kinda surprising)... Fixed that too.

As I don't have much time to work on flatbuffers right now (though I'm using it more and more), I'll just watch from the shadow to see how/if the Csharp/Java split happens and if there is some code that can be shared to simplify the kotlin generator... If there isn't, I'll write something.

But, as long as Kotlin 1.0.0 isn't released, there is no hurry...

ghost commented 8 years ago

Thanks for those improvements.

That seems like a good approach. It'll be interesting to see if we can get other Kotlin users using your implementation.

Lakedaemon commented 8 years ago

Kotlin 1.0.0 might not be that far away (maybee 1 month, 2 months away). When it is released, I'll make sure my code works with it and contact the nice jetbrains people to ask if one of them could review this pr (mostly the code produced by the generator).

Also, as I feared, there might be a defensive array copy when the spread operator * is used to pass an array to a vararg function (http://blog.jetbrains.com/kotlin/2015/10/kotlin-1-0-beta-candidate-is-out/). Which means that there is a performance hit with the create functions I wrote with vararg. I'll fix that and revert to just using arrays. The vararg methods weren't that nice anyway when building flatbuffers with big complex arrays that have to be computed

4ntoine commented 6 years ago

Since "kotlin" branch can't be found anymore (https://github.com/Lakedaemon/flatbuffers/tree/kotlin) and the last commit is on June2, 2016 i wonder if the project is still alive?

aardappel commented 6 years ago

There has not been progress for a while, @Lakedaemon are you still planning to finish this? Or anyone else want to pick this up?

snowe2010 commented 5 years ago

Is this dead?

aardappel commented 5 years ago

Until someone restarts work on it, yes.

aardappel commented 5 years ago

New effort to add Kotlin support to FlatBuffers: https://github.com/google/flatbuffers/pull/5409