Implementing Bolt v2 (and logs)

dominique-vassard commented 6 years ago

Hello, First I'd like to thank you for your work on Boltex, it's a great tool, and along Bolt sips, to be able to use Neo4j with elixir is really enjoyable.

Recently, someone adds an issue on your project regarding temporal types which are implemented in Neo4j 3.4 and there isn't answers from you yet.

As an exercise, and because I need Neo4j for a future project, I wanted to make Boltex evolve and implement temporal / spatial types, which are in fact the V2 of the Neo4j bolt protocol. Because I'm new to binary manipulation, and not so much experienced with elixir, I didn't really understand your code and things work. Then I decided to read about bolt protocol, read Neo4j's driver code for Js and python and your code, and code my own driver. I ended up with something working, you can see it here: https://github.com/dominique-vassard/bolt-neo4j.

Now that I have a fully working driver and that I understand how things work, I'd like to collaborate to boltex and make it evolve. But before, I have to understand some of choice and discuss some of choices I've made.

Messages as first decode stage

As you can see here: https://github.com/dominique-vassard/bolt-neo4j/blob/master/lib/bolt_neo4j/packstream/message.ex I've implemented messages as struct. This what done in order to have encoding / decoding completely decoupled from communication. Then if you look in bolt.ex, there's nothing regarding signature, marker except for the handshake (which should be treated as the other message though) and some <<0x00, 0x00>> I think it is a good idea to have things structured like: Communication -> Message encoding / decoding -> Data encoding / decoding. It will be easier to work on specific. What do you think?

Bolt structure

I saw that none of the specific structures (node, relationship, path) are implemented in boltex. they are implemented in bolt_sips. Is there a specific reason for this? Because bolt_sips knows about some specific signatures which are bolt-related. Won't bolt_sips should be only aware about client-server communication and use only this part of boltex?

Logs

There's no logs at all in your driver, and I found it hard to follow the data flow when I need it. In my project, I log every message between server and client, and that was easier to spot my errors and understand the communication flow. Logs are similar to those implemented in Neo4j's Python and JS driver and looks like follow:

11:37:31.650 [debug] C: HANDSHAKE ~ "<<0x60, 0x60, 0xB0, 0x17>> [2, 1, 0, 0]"

11:37:31.652 [debug] S: HANDSHAKE ~ <<0, 0, 0, 1>>

11:37:31.652 [debug] C: INIT ~ %BoltNeo4j.Packstream.Message.Init{auth_token: %{credentials: "test", principal: "neo4j", scheme: "basic"}, client_name: "BoltNeo4j/0.1"}

11:37:31.654 [debug] S: RAW_DATA ~ <<0xB1, 0x70, 0xA1, 0x86, 0x73, 0x65, 0x72, 0x76, 0x65, 0x72, 0x8B, 0x4E, 0x65, 0x6F, 0x34, 0x6A, 0x2F, 0x33, 0x2E, 0x33, 0x2E, 0x36>>

11:37:31.655 [debug] S: SUCCESS ~ %{"server" => "Neo4j/3.3.6"}

11:37:31.655 [debug] C: RUN ~ %BoltNeo4j.Packstream.Message.Run{parameters: %{}, statement: "RETURN 1 AS num"}

11:37:31.655 [debug] C: RUN ~ <<0x0, 0x13, 0xB2, 0x10, 0x8F, 0x52, 0x45, 0x54, 0x55, 0x52, 0x4E, 0x20, 0x31, 0x20, 0x41, 0x53, 0x20, 0x6E, 0x75, 0x6D, 0xA0, 0x0, 0x0>>

11:37:31.658 [debug] S: RAW_DATA ~ <<0xB1, 0x70, 0xA2, 0xD0, 0x16, 0x72, 0x65, 0x73, 0x75, 0x6C, 0x74, 0x5F, 0x61, 0x76, 0x61, 0x69, 0x6C, 0x61, 0x62, 0x6C, 0x65, 0x5F, 0x61, 0x66, 0x74, 0x65, 0x72, 0x2, 0x86, 0x66, 0x69, 0x65, 0x6C, 0x64, 0x73, 0x91, 0x83, 0x6E, 0x75, 0x6D>>

11:37:31.658 [debug] S: SUCCESS ~ %{"fields" => ["num"], "result_available_after" => 2}

11:37:31.658 [debug] C: PULL_ALL ~ []

11:37:31.660 [debug] S: RAW_DATA ~ <<0xB1, 0x71, 0x91, 0x1>>

11:37:31.660 [debug] S: RECORD ~ [1]

11:37:31.660 [debug] S: RAW_DATA ~ <<0xB1, 0x70, 0xA2, 0xD0, 0x15, 0x72, 0x65, 0x73, 0x75, 0x6C, 0x74, 0x5F, 0x63, 0x6F, 0x6E, 0x73, 0x75, 0x6D, 0x65, 0x64, 0x5F, 0x61, 0x66, 0x74, 0x65, 0x72, 0x0, 0x84, 0x74, 0x79, 0x70, 0x65, 0x81, 0x72>>

11:37:31.660 [debug] S: SUCCESS ~ %{"result_consumed_after" => 0, "type" => "r"}

I think it would be great to have those in Boltex too. What do you think? Of course, there are only for debug purpose and something should be done for the auth struct, which should'nt appear in logs.

Bolt v2

I think this part should be discussed with @florinpatrascu too. As you may know, temporal and spatial types are supported in this new Bolt version. It seems that Bolt V2 specs are not completely frozen and that some parts could be updated. But in all case, it may be interesting to provide the V2 feature to Neo4j & elixir users. With assumptions discussed above (struct in bolt_neo4j, logs, etc.), I've implement it and it works fine. We have 2 options here:

don't implement struct in boltex: then it's just a matter of asking for [2, 1, 0, 0] during hanshake to favour Bolt V2 when it's available and everything will work and new temporal / spatial types shoudl be implemented in bolt sips.
implement struct in boltex: then there is more work for the boltex team (check the protocol version, decoding to new types) and almost none for bolt_sips, but then it will be wise to implement node, relationship and path struct in boltex too, in order to be coherent. You can take a look at the decoder_v2.ex to see what's going here: first proposal and its retruns are in comments, and the implementation is with new types.

Again, I want this to be clear, even if I have a fully working Boltex ersatz, my goal is not to replace it but to collaborate with you on it and help you maintaining it. An I hope it will be the case.

Thank you, Dominique

mschae commented 6 years ago

Hey @dominique-vassard,

first off thank you very much for the hard work and feedback you provided. It's great to have more people work on bolt for Elixir.

I'd very much like to integrate the work that you've done with Boltex to avoid fragmentation. Let me address some suggestions:

Messages as first decode stage

I like that idea a lot. Not much more to say. :)

Bolt structure

I had all the Bolt structures implemented at some point but decided together with @florinpatrascu to keep this driver low level as having nodes, paths, etc. in here would've incurred serious refactoring in BoltSips. Happy to revisit.

Logs

Totally agree that we should add logs, should be the simplest of all tasks here.

Bolt v2

I'm sorry that I haven't replied to #21 yet, I agree that we should implement the new data types. When I was briefly looking at the issue I didn't yet find the new types in the Bolt documentation, probably because v2 isn't stable yet. That shouldn't stop us from implementing v2 already tho.

We should only support v2 and release a new major version of Boltex. We can use v1 for the stable branch of Bolt v1 support and v2 for newly introduced Bolt v2 support.

How should we go forward? Do you want to take a stab at porting your code to Boltex or should I start with adding logging?

Please note that I'm currently not using Neo4J in production, so apologies if I can't dedicate as much time as I wanted to.

Thanks, Michael

mschae commented 6 years ago

BTW here's the branch where I started to implement Bolt structs in this driver: https://github.com/mschae/boltex/tree/mschae/struct-datatypes

dominique-vassard commented 6 years ago

Hi, glad to see that my collaboration is interesting. I think too that the more we are to work on this, the better it is, s Neo4j move fast, very fast.

Regarding Bolt v2 documentation, there isn't any. I got in touch with Nigel from Neo4j and he said he will eventually work on the documentation. All implementation I've made are pure reverse engineering of the Python and JS driver maintained by the No4j team. There's not that much new features, then it wasn't too hard, once I understand how Bolt works. An the good news is it's just addition, no updates of previous data encoding/decoding, just new structures in fact.

V2 & Structures

I guess it would be wise to implement V2 as V1, then no structures in boltex, and maybe in the future, after discussing this issue with @florinpatrascu, get them into boltex. The only drawback of having structures in boltex is about dependency, as there is some Time structure and then operation to do that are not yet available in elixir, even in 1.7 (namely the timezones...)

V2 support only

I agree with that. In fact, if someone uses V2 features over a non-compliant database version, database will send back an error (as it already does), so no big deal

How do we work?

I won't be able to work a lot on it before early september because of vacation (without computer), but you still can implement the logging system and I will merge my work and yours for mutual benefits :) At this point, I may have some questions in order to have the better code possible.

Neo4j in production

No problem, I don't actually use Neo4j in production but I hope it will be the case next year if I manage to get my project live. But I've used it in production during 2 years, and for what it worth, I've got the Neo4j certification. Then don't worry, I won't let boltex drown!

Thanks

dominique-vassard commented 6 years ago

Hi, don't bother about logs, I'll implement them in the message encoding / decoding as everything that needs to be logged is in there. Additionally, I've almost finished the message / encoding implementation and separate this part from the communication layer. I hope I can complete it by the end of next week. Otherwise it will be for september. In all case, I'll send you a link in order to look at what's been done.

I'm talking about the first version of boltex working with Bolt protocol v2 but without the graph / spatial / temporal structs. We need to get in touch with florinpatrascu to know wether we implement this on boltex side or not.

dominique-vassard commented 6 years ago

Ok, you've got a PR. But the build failed due to this error: Failed to upload the report to 'https://coveralls.io' (reason: timeout) And another fails to even install erlang! If you retry those builds, no doubt they will succeed.

dominique-vassard commented 5 years ago

That's been a bumpy road sometimes but v2 is finally here!
@mschae Could you publish the package (I'm not authorized to do it) so I can get in touch with florin for the Bolt v2 implementation in bolt_sips, please?

florinpatrascu commented 5 years ago

Hi @mschae and @dominique-vassard

You guys both did an awesome job taking Boltex this far, and the last refactoring round is awesome, thanks a lot for making this happen.

I read through the the threads so far and thought I'd add my 0.02 CAD in here, trying to address @dominique-vassard's POV re. bolt_sips.

Yesterday, I published a new version of bolt_sips, still using the old (published) boltex but refactored bolt_sips according to the newest DBConnection design. This is meant to pave the road for future development eventually and hopefully safer this time, since DBConnection's design appears to be finally settled. Of course there are a couple of breaking changes, such as the support for non-closure based transactions; these were removed relatively late from DBConnnection, triggering some late changes in bolt_sips too.

Back to the topic. @dominique-vassard is questioning the decision about keeping the neo4j structures in bolt_sips. As @mschae mentioned, the decision was taken as a desire to keep boltex agile, lean and low-level, so it can react faster to the always-moving neo4j specs .. oh wait .. what specs?? j/k :) However, if you guys believe these structures should go into boltex instead, then I am willing to refactor bolt_sips and fallback on boltex. _Note to self: in that scenario I am not sure about the value a different driver would have?! boltsips would become too light to justify forcing the users grabbing two dependencies instead of one?!

I am not using neo4j as I used to, but I am still passionate about it, hence the time I can put in this development is mostly using my free time whenever I can find it; weekends, nights - as you very well know from your own experience, as OS developers.

If you believe Boltex should be supercharged, then maybe we should put all the effort into creating only one driver, but this will not be an easy task, especially given the need for offering the bolt routing protocol too; requiring a fair amount of work, to be done properly.

Also, in the scenario where there is only one driver, how would you guys see the license model? Honestly, my entire career as OS dev, I releases everything under Apache or MIT, but after some bitter experience with some users of bolt_sips, I am contemplating switching to a different license model i.e. GPLv3 maybe? What are your thoughts on this?

Anyway, I am open for any proposals that will result in a better Elixir driver for Neo4j.

dominique-vassard commented 5 years ago

Hi @florinpatrascu

For now, I don't think the bolt protocol foundation will change (messages, basic types) but just add new types via structure. If so, boltex won't need any changes. That's the good news and I hope I don't make a fool of myself on this.

I understand your point of view regarding types in boltex. They are the thing in between bolt_sips and boltex and today, it's better to have them is bolt_sips to pattern match easily on them. I know that some person use them to provide some kind of Ecto-like experience and I don't want to discourage those effort (I will talk about it later).

BUT Having bolt_sips and boltex merged is quite appealing:

We all have lot of things to do and don't have so much time to work on those projects. Maybe if the three of us (and other volunteers) are on the same project, one could work when the two others can't.
As I suspect that most of the work will be on the bolts_sips part, this could allow us to move faster. And keep up with the Neo4j's pace which is very fast (and undocumented...)

I really like Neo4j and plan to use it for my future project, but to be honest, there's a lot of work if we want to provide Elixir's users a good Neo4j experience. First will be to implement bolt routing as you said, to have a Enterprise-ready driver which can appeal all kind of users and not just trying Neo4j (or just too poor to use Enterprise edition....) And, in my dreams, there will be a ecto_graph to describe graph schema whith field, incominig_relationships, outgoing_relationships, with some Node and Relationship related to the Bolt.Sips.Types. And a ecto_cypher to deal with queries and deal with adapters (Neo4j, AgensGraph, etc.). But that's a dream :D

I don't know how much is required for merging the 2 projects but I think it's be a great thing to do and will be happy to be part of it!

dominique-vassard commented 5 years ago

For information: I was at a Neo4j events yesterday and get in touch with technical staff. Ans it seems that 3.5 will be out before end of the year. And along it, there will be "a framework to develop drivers". As I understand, they mean seabolt, which could be great as it covers all the boltish parts of the driver (as my poor C allows me to understand) and only a wrapper around should be necessary. I don't how to do it yet (I can't even make install the last version of seabolt) but waiting for the announcement is the thing to do. After that, everything will be clearer regarding what we have to do or not.

florinpatrascu commented 5 years ago

@dominique-vassard - than you for the heads-up! I am also curious where this driver aims to go. Last time I checked, it had no pool support, and a couple of other things missing. We'll see how it goes, and then we can decide to use it wrapped in db_connection, or in a similar fashion.

namxam commented 5 years ago

Hey guys, I was the one raising the issue about temporal types in boltex/bolt_sips. I missed this discussion, as we had quite some issues at work which delayed the project using neo4j. But we are about to launch and I was checking if something happened in the meantime.

As @dominique-vassard mentioned, there is a new connector based on C: seabolt. I am still learning a lot about elixir and neo4j, so I wont be too much of a help. Especially as neo4j's documentation on the protocol is far from good. I guess I have to walk the same road as @dominique-vassard and read the JS source code 😢

But if there is anything else I can help with, please count me in. We plan to use neo4j with elixir and therefore are happy to contribute back to the community - namely boltex and bolt_sips.

dominique-vassard commented 5 years ago

Hi namxam, regarding seabolt, as promising as it is, using it will mean using NIF and lose fault-tolerance, process isolation, etc. Which is not why I choose elixir :) Additionally, I don't how we could translate the oo aspects of the library into functional during bindings and I suppose this is the major problem we have to face if we want to use seabolt. Therefore, I think that sticking with a pure elixir lib is the best.

The pending question is about merging bolt_sips and boltex or not. There's a trend in Neo4j driver to separate low level and high level, just as @mschae and @florinpatrascu did. Maybe we should stay close to this for a while, it may be easier to do updates by reverse engineering neo4j's driver...

What do you need to know about the protocol? You'll find the majority of info on this link: https://boltprotocol.org/ This will tell you about transport layer, messaging, data type, structures, etc. with examples. There's no need to to revere engineering except for the latest part, aka Bolt v2 ( and v3...) which just add some new structures (spatial, temporal). If you want to know everything about bolt v2 and only this, instead of looking at JS code, just keep up with Elixir and use this: https://github.com/dominique-vassard/bolt-neo4j This was my pet project to understand the protocol and the types used. It has the bolt protocol and the types, types which are decoded in bolt_sips and not in boltex.

And you're welcome to contribute to elixir-neo4j environment!

florinpatrascu commented 5 years ago

regarding seabolt

I personally prefer a pure Elixir implementation, for all the reasons mentioned above.

about merging bolt_sips and boltex or not.

I had at least two iterations, where boltex (was personalized) and included in the bolt_sips. While that was probably less confusing from the end-user perspective, @mschae and I decided to keep them apart so that we can focus on separate responsibilities, especially since none of us had the time fully allocated to this project, nor any other external support for development.

These days, I am seeing most of the Neo4j drivers include everything, so probably we should do the same; include the boltex in bolt_sips, or creating a new project which will offer everything (including the bolt+routing?!) Tho' given the effort for materializing the latter, I'd question the driver current license (MIT) and personally would opt/suggest approaching a dual one: open source and commercial - but that's just me thinking out loud, saving this for another time ;)

Thank you all for your interest and support!

dominique-vassard commented 5 years ago

Te separation between low ad high level appears only in Go and Python drivers, the others I've seen (js, .net) includes everything as you said. I'm open to a one-in-all solution, contributors could be focused on 1 project instead of 2, and it seems logical if we consider that boltex is only used and designed for bolt_sips. Regarding the different option you suggested, we can do one at time:

first include boltex in bolt_sips with the less updates possible
create a proper internals directory with protocol, packstream and types as other drivers has -eventually change its license (I don't know much about license and if it's even possible to change it)

And implement bolt-routing is very important as it allows to use the Neo4j Entreprise Edition (load balancing, clustering, user management, etc.) which is a great value for companies. Today, it is not possible for a company to use elixir and neo4j together, and it's sad. And there's also bookmark, auto-retry, etc. that could be interesting too.

I don't know if there's part of bolt_sips that needs to be change and justifies the creation of a completely new project instead of just making it evolve. But in all cases, I will contribute!

And the sooner we start, the better it is because Neo4j's next iteration will be on june and there can be a lot change (again)!

dominique-vassard commented 5 years ago

@mschae Could you publish the v0.5.1 please? I'm working on the implementation of temporal and spatial types in bolt_sips but won't be able to PR before new boltex versions are published on hex. Thanks!

florinpatrascu commented 5 years ago

@dominique-vassard & @mschae - maybe this is a good time to join the two projects? Having boltex back into bolt_sips, would probably help speeding up the time required for shipping improvements to the end users?

dominique-vassard commented 5 years ago

I'm ok with this. In fact, temporal / spatial types implementation is done except for some docs and few more tests (and find a way to test with neo4j 3.0, 3.4 and 3.5 in a row). And this is all that need to be done to allow users to use bolt_sips with neo4j up to v3.5.x as bolt v3 just updates init and transaction management but is not mandatory to get all neo4j's features.

If it's ok for you, I can:

1) Make a PR to include boltex in bolt_sips "quick and dirty". Just copy the project in a internals directory, change function calls that needs to be changed, etc.
2) Make a PR for temporal / spatial types

At this point, bolt_sips will be neo4j 3.5 ready, which will be nice. But the code won't be so nice though especially regarding error systems and types which needs to be unified.

Then what needs to be done:

3) Unify errors, types and loggers
4) Bolt routing

At this point, code will be nice and bolt_sips will be entreprise ready!

And then, I can work on the low level refactor in order to bring Bolt v3 new messages, and make thing more explicit (i.e having bolt_protocol_v1.ex , bolt_protocol_v2.ex, decoder_v1.ex, etc.) for easy maintenance and upgrade.

If you're ok with this, I've got time now to work on this.

dominique-vassard commented 5 years ago

Or maybe an umbrella could be a good first step

florinpatrascu commented 5 years ago

As far as I’m concerned all these sound perfect, thank you for all this work you put in! No umbrella, please :)

mschae commented 5 years ago

Alright sorry for dropping the ball so bad here...

I'm all for consolidating bolt sips and boltex and I don't have a strong preference as to how we do it. We could start a new organization for this project even though that might be overkill.

I suggest however we move forward the three of us become maintainers so I am no longer the bottleneck. We should still use PRs but we could all approve each others PRs.

I plan on dedicating more time for my open source projects again but I am not using Neo in production ATM so I am afraid this will be a free time project.

florinpatrascu commented 5 years ago

Hi Michael - no worries about becoming a bottleneck, life happens :) @dominique-vassard and I decided to move the boltex back into the bolt_sips, as it used to be some time ago, and there is a new public bolt_sips version already available to the users.

I have no problem being part of a different project, if you all think a new project is what we want, but until then I'll send you an invite to the bolt_sips project. This way we can all have equal r/w repo permissions. Unfortunately, for the publishing part, I'll still be the bottleneck, otherwise I'd have to upgrade my hex.pm account in order to be able to share the publishing permissions too - sorry for this caveat.

dominique-vassard commented 5 years ago

Hi, The last refactoring on bolt_sips shows that the merge makes thing simpler, in term of reasoning about the different stages (query -> encoding -> call to server -> decoding -> response), and the interface between high and low level is much simpler, especially since types are mutualized. Then I don't think it's a bad thing, plus between 3 on one project will be easier for maintenance.

mschae commented 5 years ago

I’m super happy that you merged the projects. Seems like a good way to bundle our efforts! I’m all for it

mschae / boltex