hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
271 stars 121 forks source link

0.0.101 NodeAddressBook contains duplicate entries for each distinct IP #750

Closed steven-sheehy closed 3 years ago

steven-sheehy commented 3 years ago

Summary As a Hedera Services client, I need the address book to contain one entry per node so it's easier to consume and doesn't duplicate information. Currently, for the 0.0.101 address book that contains IP address information it duplicates each NodeAddress for each distinct IP that that node has. This is confusing since the node count shows as 39 in mainnet (via NodeAddressBook.getNodeAddressCount()) despite there only being 13ish nodes. It's also wasteful to duplicate information as every client who wants to communicate with the network will be pulling this file to get up to date IP information.

The proto currently looks like this: Version 0.12.0

message NodeAddress {
    bytes ipAddress = 1; // The ip address of the Node with separator & octets
    int32 portno = 2; // The port number of the grpc server for the node
    bytes memo = 3; // The memo field of the node (usage to store account ID is deprecated)
    string RSA_PubKey = 4; // The RSA public key of the node
    int64 nodeId = 5; // A non-sequential identifier for the node
    AccountID nodeAccountId = 6; // The account to be paid for queries and transactions sent to this node
    bytes nodeCertHash = 7; // A hash of the X509 cert used for gRPC traffic to this node
}

message NodeAddressBook {
    repeated NodeAddress nodeAddress = 1; // Contains multiple Node Address for the network
}

Possible resolution

File 0.0.101 will be NodeAddressBookAbbreviated File 0.0.102 will be NodeAddressBook

Version 0.13.0

message NodeEndpoint {
    string ipAddress = 1; // The IP address of the node
    string port = 2; // The port of the node
}

message NodeAddressAbbreviated { // All the information required to connect to the network
    int64 nodeId = 5; // A non-sequential identifier for the node. This value is the key between the full and abbreviated address books 
    AccountID nodeAccountId = 6; // The account to be paid for queries and transactions sent to this node
    bytes nodeCertHash = 7; // A hash of the X509 cert used for gRPC traffic to this node
    repeated NodeEndpoint = 8; // A node's IP address and port
}

message NodeAddressBookAbbreviated {
    repeated NodeAddressAbbreviated nodeAddressAbbreviated = 1; // Contains minimal node details for the network
}
message NodeAddress { // All the address book information
    bytes ipAddress = 1 [deprecated=true]; // The IP address of the Node with separator & octets
    int32 portno = 2 [deprecated=true]; // The port number of the grpc server for the node
    bytes memo = 3 [deprecated=true]; // The memo field of the node (usage to store account ID is deprecated)
    string RSA_PubKey = 4; // The RSA public key of the node
    int64 nodeId = 5; // A non-sequential identifier for the node. This value is the key between the full and abbreviated address books 
    AccountID nodeAccountId = 6; // The account to be paid for queries and transactions sent to this node
    bytes nodeCertHash = 7; // A hash of the X509 cert used for gRPC traffic to this node
    repeated NodeEndpoint = 8; // A node's IP address and port
    string description = 9; // A description of the node. Max 100 bytes.
    int64 stake = 10; // The amount of tinybars staked to this node 
}

message NodeAddressBook {
    repeated NodeAddress nodeAddress = 1; // Contains all details of the nodes for the network
}
noshmody commented 3 years ago

The above changes are expected to be backward compatible, until the deprecated fields are actually removed from the protobuf messages.

Tests:

steven-sheehy commented 3 years ago

@noshmody

  1. Can a node cert hash vary per ip and port combo?
  2. I think NodeAddressAbbreviated should have deprecated fields removed. The benefit of splitting them into two address books is that the client can continue to use NodeAddress to parse 101 and 102 for a deprecation period. If they update their code to use the new NodeAddressAbbreviated they should also be forced to not use the new fields as well.
  3. Perhaps we can add a comment to NodeAddress explaining that its use to parse 101 is deprecated and to use the new message?
  4. Still not a fan of the name NodeAddressBookAbbreviated. Would prefer something oriented around its intended audience (ClientNodeAddressBook since it contains info for clients to connect) or its contents (NodeEndpoints, NodeEndpointAddressBook, NodeConnectionAddressBook). Or if those aren't sufficient, perhaps just something a little easier to say like NodeAddressBookSummary or SimpleNodeAddressBook ~5. Should we make stake int64 since Java doesn't have unsigned?~
anighanta commented 3 years ago

issues created in protobufs repo https://github.com/hashgraph/hedera-protobufs-java/issues/60 and https://github.com/hashgraph/hedera-protobufs/issues/21

noshmody commented 3 years ago

@steven-sheehy

  1. Each node has only 1 RSA Cert (each node has multiple proxies and therefore multiple TLS certs). Therefore assuming each proxy has single IP/Port exposed, each IP/Port combination has a TLS cert. Multiple IP/Port combinations have a single RSA certificate.
  2. If fields have been marked deprecated previously then we can remove them if Donald approves.
  3. @anighanta please would you add a comment as requested
  4. @steven-sheehy @donaldthibeau would you please pick a set of names and let us know. @anighanta will implement accordingly.
anighanta commented 3 years ago

None of the fields were marked deprecated previously for NodeAddress

steven-sheehy commented 3 years ago

If fields have been marked deprecated previously then we can remove them if Donald approves.

None of the fields were marked deprecated previously for NodeAddress

NodeAddressAbbreviated is a new message type, so fields within it don't need to be marked deprecated and can be removed. Clients won't break since if they don't change they would continue to use the previous NodeAddress message to parse both 101 and 102 and have access to those fields.

noshmody commented 3 years ago

My (likely incorrect recollection of the deprecated fields was the following...Since the 2 Address Book files would contain the 2 messages listed below. File 0.0.101 will be NodeAddressBookAbbreviated File 0.0.102 will be NodeAddressBook

If we remove the fields to be marked with [deprecated=true] from the message and therefore the AB file 0.0.101 then we will break clients that parse the file today.

@steven-sheehy would you please confirm. Thanks

steven-sheehy commented 3 years ago

I don't think that is true, but would be good if you guys can verify with a test. Someone parsing 0.0.101 after these changes are live on mainnet can change nothing and continue to use their previous protos (e.g. NodeAddressBook only) to parse both 101 and 102. Even if they upgrade their protos, if their code continues to use NodeAddress to parse 101 then they won't break. They will only break if they update their code to use NodeAddressBookAbbreviated, but by doing so they would be aware that it's a new message type and should adjust things as necessary to work.

noshmody commented 3 years ago

Address Book messages have been updated to reflect @steven-sheehy & @lbaird proposals.

Engineering @Neeharika-Sompalli @anighanta will test enhance YAHCLI tool with options to read and write existing and new versions of NodeAddressBook and NodeAddressBookAbbreviated

YAHCLI Operations on Files 0.0.101 or 0.0.102

Nodes on (version 0.13.0)

Test case 2

Client moved on to (version 0.13.0)

  1. Download addressBook [ 101 ] works.
  2. Upload addressBook [ 101 ] with multiple NodeEndpoints for each NodeAddress [No Duplicates] _works.
  3. Download nodeDetails [ 102 ] works.
  4. Upload nodeDetails [ 102 ] with multiple NodeEndpoints for each NodeAddress [No Duplicates] works.

AddressBook formats test cases validated:

testCasesCovered.zip

Neeharika-Sompalli commented 3 years ago

As per review comments in https://github.com/hashgraph/hedera-protobufs/pull/22 , changes forNodeAddressAbbreviated to be renamed to ClientNodeAddress and NodeAddressBookAbbreviated to be renamed to ClientNodeAddressBook are yet to be confirmed

anighanta commented 3 years ago

@steven-sheehy @noshmody @lbaird @Neeharika-Sompalli On Genesis, as we have no 101 and 102 files .. we create them using platform's address book. By using the latest suggested protobuf formats, I used NodeAddressBookAbbreviated to generate file 101 and NodeAddressBook to generate file 102. But when we do a fileDownload on 101 and try to parse it with old protobuf version NodeAddressBook it fails as expected fields ipaddress/port/memo are not present in the downloaded file.

@noshmody suggested to use the new NodeAddressBook to generate both 101 and 102 files on GENESIS, which fixed the issue [we can now download and parse successfully the new format using old protobufs].

steven-sheehy commented 3 years ago

Right, we had said on the call both files should be generated by NodeAddressBook.

donaldthibeau commented 3 years ago

We have aligned to the following changes after discussion with @lbaird and @steven-sheehy the two file names will be:

The nested objects will be:

anighanta commented 3 years ago

We have aligned to the following changes after discussion with @lbaird and @steven-sheehy the two file names will be:

  • AddressBookForClients
  • AddressBook

The nested objects will be:

  • NodeAddressForClients
  • NodeAddress

@donaldthibeau @steven-sheehy as Platform also contains AddressBook -> com.swirlds.common.AddressBook A lot of explicit imports for com.hederahashgraph.api.proto.java.AddressBook are needed in the services code. can we have a different name for addressBook from protobuf to avoid this ambiguity.

Sample Error whilemvn clean install without explicit imports :

ERROR] hedera-services/hedera-node/src/main/java/com/hedera/services/state/initialization/HfsSystemFilesManager.java:[74,23] reference to AddressBook is ambiguous
[ERROR]   both class com.hederahashgraph.api.proto.java.AddressBook in com.hederahashgraph.api.proto.java and class com.swirlds.common.AddressBook in com.swirlds.common match
SimiHunjan commented 3 years ago

@noshmody can you please provide the final naming convention and update the issue.

anighanta commented 3 years ago

@noshmody can you please provide the final naming convention and update the issue.

@SimiHunjan These are the new message names and structures.

Version 0.13.0

message NodeEndpoint {
    string ipAddress = 1; // The IP address of the node
    string port = 2; // The port of the node
}

message NodeAddressForClients { // All the information required to connect to the network
    int64 nodeId = 5; // A non-sequential identifier for the node. This value is the key between the full and abbreviated address books 
    AccountID nodeAccountId = 6; // The account to be paid for queries and transactions sent to this node
    bytes nodeCertHash = 7; // A hash of the X509 cert used for gRPC traffic to this node
    repeated NodeEndpoint = 8; // A node's IP address and port
}

message AddressBookForClients {
    repeated NodeAddressForClients nodeAddressForClients = 1; // Contains minimal node details for the network
}
message NodeAddress { // All the address book information
    bytes ipAddress = 1 [deprecated=true]; // The IP address of the Node with separator & octets
    int32 portno = 2 [deprecated=true]; // The port number of the grpc server for the node
    bytes memo = 3 [deprecated=true]; // The memo field of the node (usage to store account ID is deprecated)
    string RSA_PubKey = 4; // The RSA public key of the node
    int64 nodeId = 5; // A non-sequential identifier for the node. This value is the key between the full and abbreviated address books 
    AccountID nodeAccountId = 6; // The account to be paid for queries and transactions sent to this node
    bytes nodeCertHash = 7; // A hash of the X509 cert used for gRPC traffic to this node
    repeated NodeEndpoint = 8; // A node's IP address and port
    string description = 9; // A description of the node. Max 100 bytes.
    int64 stake = 10; // The amount of tinybars staked to this node 
}

message AddressBook {
    repeated NodeAddress nodeAddress = 1; // Contains all details of the nodes for the network
}
steven-sheehy commented 3 years ago

Rationale

After further discussions, the above design has been noted to causes some pain points:

Proposal

Keep the same message structure for both 0.0.101 and 0.0.102. Never remove the deprecated fields as they will be needed for mirror nodes parsing historical address books. The assumption is any client who understands v0.13 will update their code to use serviceEndpoint if it's populated, otherwise they will fallback to using ipAddress and portno fields. If they don't upgrade to v0.13 or upgrade to v0.13 without adding this logic, then they will break regardless of server format after the deprecation period.

/*
Contains the IP address and the port representing a service endpoint of a Node in a network. Used to reach the Hedera API and submit transactions to the network.
*/
message ServiceEndpoint {
    bytes ipAddressV4 = 1; // The 32-bit IP address of the node encoded in left to right order (e.g. 127.0.0.1 has 127 as its first byte)
    int32 port = 2; // The port of the node
}

/*
The data about a Node – including IP Address, and the crypto account associated with the Node.

All fields are populated in the 0.0.102 address book file while only fields that start with # are populated in the 0.0.101 address book file.
*/
message NodeAddress {
    bytes ipAddress = 1 [deprecated=true]; // The IP address of the Node with separator & octets encoded in UTF-8. Usage is deprecated, ServiceEndpoint is preferred to retrieve a node's list of IP addresses and ports.
    int32 portno = 2 [deprecated=true]; // The port number of the grpc server for the node.  Usage is deprecated, ServiceEndpoint is preferred to retrieve a node's list of IP addresses and ports.
    bytes memo = 3 [deprecated=true]; // Usage is deprecated, nodeAccountId is preferred to retrieve a node's account ID.
    string RSA_PubKey = 4; // The x509 RSA public key of the node encoded in hex format.
    int64 nodeId = 5; // # A non-sequential identifier for the node.
    AccountID nodeAccountId = 6; // # The account to be paid for queries and transactions sent to this node.
    bytes nodeCertHash = 7; // # A hash of the X509 cert used for gRPC traffic to this node.
    repeated ServiceEndpoint serviceEndpoint = 8; // # A node's service IP addresses and ports.
    string description = 9; // A description of the node. Max 100 bytes.
    int64 stake = 10; // The amount of tinybars staked to this node.
}

/*
A list of nodes and their metadata that contains all details of the nodes for the network
*/
message NodeAddressBook {
    repeated NodeAddress nodeAddress = 1; // Metadata of nodes on the network
}

Compatibility Matrix

Client Server Deprecated fields populated Breaking
v0.12 v0.12 Yes No
v0.12 v0.13 Yes No
v0.12 v0.13 No Yes
v0.13 v0.12 Yes No
v0.13 v0.13 Yes No
v0.13 v0.13 No No

Note: Deprecated fields populated column means ipAddress and portno are still populated in the address book files.