oxen-io / lokinet

Lokinet is an anonymous, decentralized and IP based overlay network for the internet.
https://lokinet.org/
GNU General Public License v3.0
1.71k stars 221 forks source link

[internals refactor] stub out platform layer #2119

Closed majestrate closed 1 year ago

majestrate commented 1 year ago

stub out the platform layer for the internals refactor

todo: describe changeset, it's really good.

majestrate commented 1 year ago

i will restructure the commits later.

majestrate commented 1 year ago

i want to do documentation of the replacement components first and then a full redo of them, instead of refactoring a pile of tech debt and trying to manage the scope and have it fail (again). i feel like if i was allowed to do a full clean room redesign back in january like i really wanted to i'd be much farther along than i am now.

ghost commented 1 year ago

i want to do documentation of the replacement components first and then a full redo of them, instead of refactoring a pile of tech debt and trying to manage the scope and have it fail (again). i feel like if i was allowed to do a full clean room redesign back in january like i really wanted to i'd be much farther along than i am now.

That would be a great opportunity to do a full clean room rewrite in Rust then ;-) Just trolling I know you don't like Rust lol

I really wish I can help though. Do you need a rubber duck [1] to listen/talk to you day by day for the redesign? I know the pain of working on tech debt and I'm happy to be a rubber duck if needed.

Appreciate for the hard work.

[1] https://en.wikipedia.org/wiki/Rubber_duck_debugging

majestrate commented 1 year ago

On Wed, 26 Apr 2023 22:27:04 -0700 Qian Hong @.***> wrote:

i want to do documentation of the replacement components first and then a full redo of them, instead of refactoring a pile of tech debt and trying to manage the scope and have it fail (again). i feel like if i was allowed to do a full clean room redesign back in january like i really wanted to i'd be much farther along than i am now.

That would be a great opportunity to do a full clean room rewrite in Rust then ;-) Just trolling I know you don't like Rust lol

i would genuinely love to get the code and docs to a state where someone could do that.

I really wish I can help though. Do you need a rubber duck [1] to talk to you day by day for the redesign? I know the pain of working on tech debt and I'm happy to be a rubber duck if needed.

Appreciate for the hard work.

[1] https://en.wikipedia.org/wiki/Rubber_duck_debugging

i have a few rubber duckies i debug with daily i got off amiami.

-- ~jeff

ghost commented 1 year ago

I would like to express my respect and appreciation for the incredible work accomplished by this project. The expertise and dedication required by the project are truly outstanding, and there is a lot of invaluable knowledge I am not able to catch up yet. With that said, I hope you don't mind me humbly offering some suggestions regarding the refactoring strategy. I'm by no means an expert, and it's possible that you are already familiar with the ideas I'm about to mention, but I thought I'd share them just in case they might be of some help.

When refactoring simple code, the "Atomic Swap" approach is often employed, in which the entire refactoring is completed in a single or just a few git commits, during which old data structures are immediately replaced by new ones. However, for extremely complicated code, a "Clone & Reduce" migration approach with numerous micro-commits might be more beneficial, as it reduces the risks of compilation errors or semantic inconsistencies when the scope of refactoring becomes too large to manage effectively.

For instance, when refactoring VPNInterface::VPNInterface to AppleVPNInterface::AppleVPNInterface, one could retain the old data structure in the codebase and develop a new data structure (e.g., VPNInterface_v2 or AppleVPNInterface) in parallel. This way, callers can be incrementally migrated to the new structure, one commit per caller, with unit tests and documentation added along the way. When it's clear that the old data structure is no longer in use, it can be safely removed in the last commit of the series.

If a caller of the old data structure cannot be migrated cleanly to the new one without breaking higher-level code, the Clone & Reduce migration strategy can be propagated from the callee to the caller. For example, a new function (my_func_v2) could be created alongside the existing one (my_func), with slightly different implementation details in terms of their respective data structures, i.e., my_func_v2 uses my_structure_v2 while my_func uses my_structure. As long as the public API remains backward compatible, this propagation strategy is bound to reach a termination point. To ensure semantic consistency, we can even apply the same strategy for the public API, i.e., considering implementing a my_api_v2 in parallel with my_api and implementing conformance testing between my_api_v2 and my_api, where a property-based testing framework could be very powerful for such a case.

The Clone & Reduce migration approach might seem tedious at first glance, but breaking down large refactoring tasks into manageable subtasks could prevent the process from going out of control. Moreover, it allows for the refactoring process to be paused and resumed without losing context, and even enables team collaboration and high level discussion on refactoring plan, as long as each subtask is encapsulated in a series of micro-commits.

Once again, I want to emphasize my admiration for your expertise and the remarkable work you've done on this project. Please accept my apologies if my suggestion isn't helpful or relevant to your situation. However, it would be a great honour if my thoughts could offer any value.

majestrate commented 1 year ago

On Monday, May 8, 2023 10:21:13 AM EDT Qian Hong wrote:

I would like to express my respect and appreciation for the incredible work accomplished by this project. The expertise and dedication required by the project are truly outstanding, and there is a lot of invaluable knowledge I am not able to catch up yet. With that said, I hope you don't mind me humbly offering some suggestions regarding the refactoring strategy. I'm by no means an expert, and it's possible that you are already familiar with the ideas I'm about to mention, but I thought I'd share them just in case they might be of some help.

this is appreciated, thanks.

When refactoring simple code, the "Atomic Swap" approach is often employed, in which the entire refactoring is completed in a single or just a few git commits, during which old data structures are immediately replaced by new ones. However, for extremely complicated code, a "Clone & Reduce" migration approach with numerous micro-commits might be more beneficial, as it reduces the risks of compilation errors or semantic inconsistencies when the scope of refactoring becomes too large to manage effectively.

the issue we are facing right now is a lack of component boundaries, right now everything blurs into each other and the separations of concerns isn't very defined because the concerns the component is responsible for are not concrete to begin with. it's all adhoc.

For instance, when refactoring VPNInterface::VPNInterface to AppleVPNInterface::AppleVPNInterface, one could retain the old data structure in the codebase and develop a new data structure (e.g., VPNInterface_v2 or AppleVPNInterface) in parallel. This way, callers can be incrementally migrated to the new structure, one commit per caller, with unit tests and documentation added along the way. When it's clear that the old data structure is no longer in use, it can be safely removed in the last commit of the series.

i've tried this in some of the past tries at this, i think a cleanroom redefinition of the components is the right play here. i am starting with the wire protocol docs in pr #2168 and this PR will eventually be rolled into that when the times comes to redo the .snode and .loki components, right now this chunk is splitting up the "frontend" and "backend" of the llarp::service::Endpoint kitchen sink. the frontend being how the user/os does traffic io, like dns to map addresses, managing how the tun interface works, and everything that isn't related to the specifics of the onion routing protocols.

If a caller of the old data structure cannot be migrated cleanly to the new one without breaking higher-level code, the Clone & Reduce migration strategy can be propagated from the callee to the caller. For example, a new function (my_func_v2) could be created alongside the existing one (my_func), with slightly different implementation details in terms of their respective data structures, i.e., my_func_v2 uses my_structure_v2 while my_func uses my_structure. As long as the public API remains backward compatible, this propagation strategy is bound to reach a termination point. To ensure semantic consistency, we can even apply the same strategy for the public API, i.e., considering implementing a my_api_v2 in parallel with my_api and implementing conformance testing between my_api_v2 and my_api, where a property-based testing framework could be very powerful for such a case.

i am not a fan of the v2 rename approach because then you get things like simplewallet wallet_api and wallet2.

The Clone & Reduce migration approach might seem tedious at first glance, but breaking down large refactoring tasks into manageable subtasks could prevent the process from going out of control. Moreover, it allows for the refactoring process to be paused and resumed without losing context, and even enables team collaboration and high level discussion on refactoring plan, as long as each subtask is encapsulated in a series of micro-commits.

Once again, I want to emphasize my admiration for your expertise and the remarkable work you've done on this project. Please accept my apologies if my suggestion isn't helpful or relevant to your situation. However, it would be a great honour if my thoughts could offer any value.

the biggest chunk of work is actually figuring out where the bounary between the platform code (vpn/embedded lokinet) and the top of the onion routing code (.loki/.snode related codepaths) actually exists right now. this PR i think has done that but now the rest documenting it as it is wired up.

ghost commented 1 year ago

Appreciate for the comment. Thanks for the great work being done, and very excited to see the progress.

I believe you have a point in everything else, except the only one:

i am not a fan of the v2 rename approach because then you get things like simplewallet wallet_api and wallet2.

This is not an issue, because in the absolute final step of the entire refactoring, we can mechanically rename all ugly _v2 back to "_v1" (or "<empty>") without breaking anything any more. The _v2 thing is like a scaffold, which is absolutely safe to remove when the building is finished, it's up to the author's aesthetic preference to decide whether they want to remove _v2 in the final commit.

majestrate commented 1 year ago

On Mon, 08 May 2023 07:41:09 -0700 Qian Hong @.***> wrote:

Appreciate for the comment. Thanks for the great work being done, and very excited to see the progress.

I believe you have a point in everything else, except the only one:

i am not a fan of the v2 rename approach because then you get things like simplewallet wallet_api and wallet2.

This is not an issue, because in the absolute final step of the entire refactoring, we can mechanically rename all ugly _v2 back to "_v1" (or "") without breaking anything any more. The _v2 thing is like a scaffold, which is absolutely safe to remove when the building is finished, it's up to the author's aesthetic preference to decide whether they want to remove _v2 in the final commit.

i am personally not a fan of that approach, it feels too hacky and has a high potential of being forgotten that the _v2 needs to be removed. see wallet2 on how bad that gets.

-- ~jeff

ghost commented 1 year ago

i am personally not a fan of that approach, it feels too hacky and has a high potential of being forgotten that the _v2 needs to be removed. see wallet2 on how bad that gets.

I guess it wouldn't be forgotten as long as the WIP commits are encapsulated in a PR, rather than being pushed into dev branches directly without any review. A checklist can be created even before refactoring, with increased notes during the process. The removal process could be very mechanical, and the final peer review can effectively eliminate the risk of forgetting the removal.

I agree it's hacky, but it's a pretty common practice, similar tactics are mentioned many times in different ways and different scopes in Martin Fowler's Refactoring

I respect your preference though, it's your code, your baby ;-)

majestrate commented 1 year ago

On Mon, 08 May 2023 08:35:12 -0700 Qian Hong @.***> wrote:

i am personally not a fan of that approach, it feels too hacky and has a high potential of being forgotten that the _v2 needs to be removed. see wallet2 on how bad that gets.

I guess it wouldn't be forgotten as long as the WIP commits are encapsulated in a PR, rather than being pushed into dev branches directly without any review. A checklist can be created even before refactoring, with increased notes during the process. The removal process could be very mechanical, and the final peer review can effectively eliminate the risk of forgetting the removal.

I agree it's hacky, but it's a pretty common practice, similar tactics are mentioned many times in different ways and different scopes in Martin Fowler's Refactoring

I respect your preference though, it's your code, your baby ;-)

fair, i just have a lot of reluctance to keep the old code around.

-- ~jeff

ghost commented 1 year ago

fair, i just have a lot of reluctance to keep the old code around.

Refactoring spent 10 pages discussing why are developers reluctant to refactor their programs, we might not necessary agree all of them, but quite an interest read though:

Why Are Developers Reluctant to Refactor Their Programs? ...................... 312

ghost commented 1 year ago

The following is a cyclic dependency graph of the subcomponents within the llarp folder from the dev branch at f702aacc389f34ddeabaaeebafb8c87f7e611637, generated using https://github.com/shreyasbharath/cpp_dependency_graph. This representation highlights and presents only those subcomponents that exhibit cyclic dependencies, ignoring those without such dependencies.

llarp-cyclic-deps-dev

Not all cyclic dependencies are equal; some may be easier to resolve than others, it's better to start from easy ones than those harder. Certain cyclically dependent components could potentially be merged into a single component, while others might require rewriting and decoupling to achieve a linear dependency structure. By addressing these cyclic dependencies, we can reduce the overall complexity of the system and progressively clarify component boundaries. This chart might be useful for designing a refactoring plan.

For reference, this is another chart but for the platform-layer-2023-01-17 branch.

llarp-cyclic-deps-platform-layer-2023-01-17

majestrate commented 1 year ago

the intent of this refactor is to remove service and handlers but picking them a part into flow and platform respectively and clearly defining the boundary between them.

majestrate commented 1 year ago

to elaborate, the components are being redesigned into something like the OSI stack where we have these layers with clearly defined responsibilities and boundaries stacked as follows

platform

provides "i have some data from the user/os, figure out what lokinet will do with it" functionality and the reverse direction as well. responsible for doing i/o between lokinet and the os, translating what we got from the flow layer to the os and translating what we got from the os and sending it to the flow layer. handles all platform specific api interfaces, like embedded lokinet, vpn mode, or whatever other future mode of operation we support.

flow

provides "figure out how to send to .loki or .snode, let me know when you figured that out and then let me send/receive data with them" functionality. handles the state of .loki and .snode traffic we are sending and receiving via established onion routing paths. here we handle route timeouts, handover, name resolution, load balancing across different routing paths.

routing

provides a "send data to this .snode anonymously or however else you do that" functionality. what we send to that .snode and how we do that shouldn't matter to people atop this layer as sometimes we dont want to pass data up to the flow layer when we get it on the onion path, for cases like we are a snode handling a namehash lookup, or an introset publish sent on an onion path. this layer lets us do that. uses onion paths if you're trying to anonymously send things, if you are doing snode to snode, you end up fast tracking your traffic directly to the link layer.

onion

provides a "build a path across these explicitly defined hops, let me know when the path works or failed and let me send/receive routing layer traffic with it" functionality.

link

provides "i am node A and i got data directly from node B, i am node A and I just sent data directly to node B" functionality, nodes are addressable via a public identity key, which in the case of a service node is a long term ed25519 public key. in the case of the client, it's an ephemeral ed25519 public key.

(it'd be great to have client keys to be distinct from snode keys here, the amount of effort to do so is not worth it because read the text for the layers above i already wrote, those layers above is where 90% of our bugs are in practice and this layer works fine for now, it's not great but it's the lowest priority for refactoring. this is really frustrating to hear people with no clue about the state of lokinet internals think they understand the problem and it's in the link layer. it is not, i keep saying it's not. why can't people understand. it's so tiresome.)

wire

provides "i know that node B is reachable via ip:port (or maybe it's a bluetooth address now) so i'll send to them using our specific way to get to them in such a way that they know i am still node A.

what's irritating for me is that this refactor starts with the most critically fragile parts and addresses them by splitting them up into a cohesive state, but now instead we are redoing the bottom two layers, the ones that are the least buggy/hacky out of all of them. i keep pushing back but i have given up. other people want to redo them for purity reasons. yes these are nih protocols right there, but none of those even come close to being as nightmarish as the stuff that sits atop it. i had the whole refactor thought out since 2021, but i never got a big enough time to pick away at it so that it actually got done. i think that actually writing this out will help others understand where my very strong kneejerks come from when people talk about their opinions on the priorities in lokinet's tech debt. for embedded lokinet the single most important part is making the platform/flow/routing layers not an inscrutable clusterfuck. all of the bugs in lokinet that make the experience take a shit happen inside that kitchen sink. perhaps other lower layers are a contributing factor but that code is such a pile that i dont think it even matters as we can't know if it is until it's untangled.