feat(cli): publish to multiple nodes

dan-da commented 1 year ago

Addresses #173.

modifies the publish command to first query "our" node for its peers and then concurrently publish to each peer.

This is just a temporary feature until the p2p layer properly transmits code between nodes.

notes:

This is kind of a draft, looking for feedback. but if you feel its ok to merge, go for it.
I didn't bother with creating a flag/mode to publish only to "our" node (original behavior) because this is just a temporary thing, and I didn't want to put too much time/complexity into it. But if its important, let me know.
seems to work as intended, but I haven't yet tested what happens when a node is slow or unavailable.
I'm not a rust thread/tokio/async wizard, so I believe it is parallel+concurrent, but there could be some gotcha there, eg tokio decides to run all tasks on the same thread for whatever black box reason.

Edit by @steinerkelvin

TODO:

[x] --hosts CLI parameter to override to which hosts to publish

dan-da commented 1 year ago

@steinerkelvin nice. yeah I wanted to display a real error message, but discovered that Err was empty, so I just left it as-is.

steinerkelvin commented 1 year ago

One thing I noticed is this doesn't ensure the main API host is included. Hum.

One todo: the --hosts parameter.

dan-da commented 1 year ago

I named the param --host because latest clap seems to strongly prefer (maybe require?) that Vec<> args are passed as --param <item1> --param <item2>. Instead of --param item1 item2 as structopt accepted. I thought that --hosts x.x.x.x:80 --hosts:y.y.y.y:80 seems wrong and --host x.x.x.x:80 --host:y.y.y.y:80 is better. Also one can just use -h.

dan-da commented 1 year ago

One thing I noticed is this doesn't ensure the main API host is included. Hum.

I initially appended the --api host to the peer list. However, I found it was duplicated because the call to client.getpeers was returning it, so I dropped that code.

Now after I pulled the latest commits to this branch, I find that kindelia get peers only works once after the node starts up, and after it is always an empty list. So I guess something has changed in that area...?

steinerkelvin commented 1 year ago

I initially appended the --api host to the peer list. However, I found it was duplicated because the call to client.getpeers was returning it, so I dropped that code.

We could just add if it's missing?

Now after I pulled the latest commits to this branch, I find that kindelia get peers only works once after the node starts up, and after it is always an empty list. So I guess something has changed in that area...?

Yeah, you just stumbled in the exact case in that this would be relevant: your own node isn't in the peers' list, because other you don't have other peers broadcasting that address to yourself. On a side note, I wonder if a node could be aware an address points to itself without hardcoding it.

I've bumped the network_idon dev to 0xCAFE0005 which is now what is running on the cloud. I had to reset that test chain because santi added code to the genesis block and that changed it's hash, which invalidates the entire chain. I wonder how we can better coordinate on that. I guess having an endpoint in which the nodes report the network id and genesis block hash they are using would help:

http https://node-ams3.testnet.kindelia.org:8000/info

{
    "network_id": "0xCAFE0005",
    "genesis_hash": "0xa204e26d20450f127dc16c9c7c0985f2e664afef8a2b223041f04c33df2e5b6a",
}

also, maybe a dedicated text channel for these updates / resets.

dan-da commented 1 year ago

We could just add if it's missing?

done.

--hosts CLI parameter to override to which hosts to publish

implemented.

This PR is feature complete now afaict and ready for review.

One thing that still bothers me is that I see different behavior when publishing dup transactions to my local node vs remote nodes.

1st run, no dups:

$ kindelia publish  ../example/example.kdl 
Transaction #0: PUBLISHED to http://localhost:8000/ (tx added to mempool)
NOT PUBLISHED to http://12.88.153.42/. (Error 501 Not Implemented: <html>
<head><title>Error 501: Not Implemented</title></head>
<body>
<h1>Error 501: Not Implemented</h1>
</body>
</html>
)
Transaction #0: PUBLISHED to http://64.227.110.69/ (tx added to mempool)
NOT PUBLISHED to http://34.125.71.252/. (Error 404 Not Found: 404 page not found
)
NOT PUBLISHED to http://104.155.237.82/. (Error 404 Not Found: 404 page not found
)
Transaction #0: PUBLISHED to http://188.166.3.140/ (tx added to mempool)

2nd run, tx #0 is a dup:

$ kindelia publish  ../example/example.kdl 
Transaction #0: NOT PUBLISHED to http://localhost:8000/: Transaction 0x574adb292b12fc4c1c1ec845c8e79f19c7781ca3bf059172431d0c310de1f30c already included on pool
NOT PUBLISHED to http://12.88.153.42/. (Error 501 Not Implemented: <html>
<head><title>Error 501: Not Implemented</title></head>
<body>
<h1>Error 501: Not Implemented</h1>
</body>
</html>
)
Transaction #0: PUBLISHED to http://64.227.110.69/ (tx added to mempool)
NOT PUBLISHED to http://34.125.71.252/. (Error 404 Not Found: 404 page not found
)
NOT PUBLISHED to http://104.155.237.82/. (Error 404 Not Found: 404 page not found
)
Transaction #0: PUBLISHED to http://188.166.3.140/ (tx added to mempool)

So 127.0.0.1 rejects tx #0 as a dup, but 64.227.110.69 and 188.166.3.140 accept it.

I'm wondering if perhaps those peers are running older code that doesn't do any dup detection?

kings177 commented 1 year ago

Running `/home/michiru/Kindelia/target/debug/kindelia publish kinguFun.kdl`
thread 'main' panicked at 'Command publish: Short option names must be unique for each argument, but '-h' is in use by 
both 'host' and 'help' (call `cmd.disable_help_flag(true)` to remove the auto-generated `--help`)',
/home/michiru/.cargo/registry/src/github.com-1ecc6299db9ec823/clap-4.0.22/src/builder/debug_asserts.rs:121:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

apparently the -h flag is conflicting with --help

dan-da commented 1 year ago

@kings177 thx. I've removed the publish -h shortopt for now.

It might make sense to think of another shortopt, perhaps renaming --host to --remote or something, so we could then use -r.

--node and -n could be another option.

racs4 commented 1 year ago

@kings177 thx. I've removed the publish -h shortopt for now.

It might make sense to think of another shortopt, perhaps renaming --host to --remote or something, so we could then use -r.

--node and -n could be another option.

Perhaps it could be --peers just like the --initial-peers?

racs4 commented 1 year ago

I'm wondering if perhaps those peers are running older code that doesn't do any dup detection?

I will investigate this

dan-da commented 1 year ago

Perhaps it could be --peers just like the --initial-peers?

I considered this.

My reasoning is that nodes are peers of eachother, but the cli is a client of a node... often the localhost node but potentially any node. So they are not peers to the cli. (except in the context of node start, wherein the cli becomes a node).

stated differently, --initial-peers is in the context of node start, but --host is in the context of publish from cli to a node.

So that's why I thought maybe --node or --remote instead.

Just figured I should share my thought process.

Also, regarding plural vs singular eg --peers vs --peer see this comment.

Please let me know if you still want --peers or something else.

racs4 commented 1 year ago

I'm wondering if perhaps those peers are running older code that doesn't do any dup detection?

I will investigate this

From what I saw the nodes are doing this detection. The code that makes it (in the add_transaction function, in the node.rs file), has been inserted in the repository for a long time, and I tested using the current publish command, publishing the same Statement, and I got the expected error (transaction already included).

What may have happened is that nodes other than localhost have already mined its transactions and therefore removed this transaction from the mempool, thus allowing their reinsertion.

racs4 commented 1 year ago

My reasoning is that nodes are peers of eachother, but the cli is a client of a node... often the localhost node but potentially any node. So they are not peers to the cli. (except in the context of node start, wherein the cli becomes a node).

Yes, I see what you meant and it makes sense. My preference is to stick with host or hosts, but I'm pretty bad with naming.

It looks like clap is really having trouble parsing --arg opt1 opt2. But I did a test by putting value_delimiter=',' in the clap macro and it managed to parse something like --arg opt1,opt2, but in the clap help text the value_delimiter was not shown, which can cause confusion.

Do you think it's a good idea to just leave --host without the short option?

dan-da commented 1 year ago

I did a test by putting value_delimiter=','

Nice find, it works. I also tried value_delimiter=' ' but it doesn't parse.

I changed the help text to:

      --host <host>  specify host(s) to connect to. x.x.x.x:port (comma separated)

btw the space separated values used to be the default for Vec args in structopt, but when structopt was recently merged into clap, they made this change. afaict from looking at clap github issues, the reasoning is that "the unix way" is to use multi occurence args, eg: -h <host> -h <host> -h <host>, and that this is far more common in cli programs anyway. So that seems to be the direction they are pushing.... I almost wonder if this value_delimiter workaround is an oversight they might "fix" soon.

Ok, so with that background in mind...

Do you think it's a good idea to just leave --host without the short option?

I think it depends on if we want to use/document the multi occurrence args.

In particular publish --host <x> --host <y> --host <z> is much more annoying to type than -h <x> -h <y> -h <z>.

So then we are back to -h interfering with help. So I've been leaning toward using --node and -n for this.

But... now we have the wrinkle that --host x,y,z also works. So we could just make that the documented way of doing it, and the multi occurrence method is hidden/undocumented.

With all that taken into consideration, my preference is for using --node, -n combined with value_delimiter=','and help text specify node(s) to connect to. x.x.x.x:port (comma separated). This enables:

publish -n x,y,z
publish -n x -n y -n z

publish --node x,y,z
publish --node x --node y --node z

I will push a commit for your consideration.

dan-da commented 1 year ago

What may have happened is that nodes other than localhost have already mined its transactions and therefore removed this transaction from the mempool, thus allowing their reinsertion.

I'm not sure I understand. So is this is a situation where the network gets out of sync because the p2p layer is not finished? I would think that (non-mining) localhost would be informed by the mining nodes of the new blocks and update its state to match.

Also, I just restarted my localhost node with kindelia node start --mine and I still see the same behavior when publishing.

Have you tried publishing to multiple peers, and if so do you see the same behavior?

racs4 commented 1 year ago

So we could just make that the documented way of doing it, and the multi occurrence method is hidden/undocumented.

Looks good to me, but since we're going to use the comma and make this the standard way, I prefer it to be --nodes and not --node

I'm not sure I understand.

What may have happened is that the two other nodes have mined a block that will compete for the end of the chain and these blocks have not yet been included by localhost, which has not mined one of its own (which would remove the transaction from the mempool) and has not received the block mined from other nodes or received and not included (to include a block it is necessary to have its prev included, which is done through messages like GiveMeThatBlock until you have all the necessary prev).

Have you tried publishing to multiple peers, and if so do you see the same behavior?

I tried running kindelia publish example/example.kdl --host 64.227.110.69:80 --host 188.166.3.140:80 and got the following results:

1 -
Transaction #0: PUBLISHED to http://64.227.110.69/ (tx added to mempool)
Transaction #0: PUBLISHED to http://188.166.3.140/ (tx added to mempool)

2 -
Transaction #0: NOT PUBLISHED to http://64.227.110.69/: Transaction 0x574adb292b12fc4c1c1ec845c8e79f19c7781ca3bf059172431d0c310de1f30c already included on pool
Transaction #0: PUBLISHED to http://188.166.3.140/ (tx added to mempool)

3 -
Transaction #0: PUBLISHED to http://64.227.110.69/ (tx added to mempool)
Transaction #0: NOT PUBLISHED to http://188.166.3.140/: Transaction 0x574adb292b12fc4c1c1ec845c8e79f19c7781ca3bf059172431d0c310de1f30c already included on pool

What I think that could have happened is:

1 - 
188.166.3.140 - added transaction to mempool
64.227.110.69 - added transaction to mempool

64.227.110.69 - mined a block

2 - 
188.166.3.140 - not added, already included
64.227.110.69 - added transaction to mempool

188.166.3.140 - mined a block or received the `64.227.110.69` block

3 - 
188.166.3.140 - added transaction to mempool
64.227.110.69 - not added, already included

To be sure of this we would have to debug these nodes; maybe we could activate the events websocket of these nodes for this purpose.

dan-da commented 1 year ago

Looks good to me, but since we're going to use the comma and make this the standard way, I prefer it to be --nodes and not --node

done.

To be sure of this we would have to debug these nodes

Ok, I don't know how to go about that, so I think I will leave it in your hands.

kindelia / Kindelia

feat(cli): publish to multiple nodes #229

Edit by @steinerkelvin