Closed Abhayakara closed 3 years ago
@Abhayakara , thanks for raising this issue!
My guess is that the Service/Server TLV format used to advertise the SRP Server is not compatible between the two implementations. OpenThread currently only includes the RLOC16 and Port information, rather than the full IPv6 address.
Can you provide one or both of:
netdata show
using the OpenThread CLI.@Abhayakara I think as Jonathan said the format of the TLVs are different between the implementations. As you know we are still discussing this in Thread TC calls (the mechasnim on how to discover SRP servers). Moving forward I think we want to converge and use the model we discussed:
The address here df2:7a68:5f3e:108c:0:ff:fe00:fc11
seems to be actually the anycast address associated with network data service entry. I guess the client was running as a sleepy device and is not configured to get full network-data (so it only gets the ALOC16 in the network-data from which the anycast address is derived).
The current OT implementation uses the following format for ServerTlv data (basically just a port number).
https://github.com/openthread/openthread/blob/a9d43fac5da39cc108cb3b57eb7df835db611c80/src/core/thread/network_data_service.hpp#L174-L207
The service number itself can be configured using OPENTHREAD_CONFIG_SRP_SERVER_SERVICE_NUMBER
(which seems to be set by default to 0x5d
).
The question is if we want to add support for the current format of Service TLV used by Apple implementation (for legacy compatibility) or if we can move to the new model. If we do need it for legacy compatibility I see two ways:
Thoughts?
@Abhayakara, is wpantund
being used on the border router side (on RPi)?
If so, please note that wpantund
default behavior is to filter (not add) the RLOC and ALOC IPv6 addresses from NCP on the host "wpan0"
netif that it manages (there are some historical reasons why this behavior was desired and become default mode).
I guess this may be the reason why the anycast address is not seen in the list of addresses on "wpan0" on the BR.
This behavior can be changed through two config properties in wpantund
:
"Thread:Config:FilterRLOCAddresses"
and "Thread:Config:FilterALOCAddresses"
properties. false
for wpantund
to sync/add both RLOC and ALOC addresses from NCP on the host netif. wpantund.conf
and setting these properties in the config file to ensure wpantund
starts with proper config.It's going to be necessary to be able to advertise an SRP service that's not on the Thread network at all, so it makes sense to support the ability to choose a server that's got an IPv6 address that's not mesh-local and not anycast. Separately, we should support anycast. Supporting the external SRP server is backwards-compatible with what HomePod Minis currently advertise, so this gets us backward compatibility for free.
Using different formats for different implementations won't be interoperable, so that's definitely not a good plan. That is, you couldn't mix and match Apple routers and other routers, for example. Given that anycast gives us what we want as the default behavior, I think just specifying the IPv6 address and port is a good secondary behavior, and we don't need the flexibility of having yet another format.
We are not using wpantund on the Raspberry Pi Duckhorn Border Router implementation.
It's going to be necessary to be able to advertise an SRP service that's not on the Thread network at all, so it makes sense to support the ability to choose a server that's got an IPv6 address that's not mesh-local and not anycast.
In the spirit of minimizing optionality, what about having the border router serve as a proxy when the SRP service is not on the Thread network? Thread devices then only need to worry about the anycast address and the network data is more compact. In either case, the border router will need to know the IP and port of the SRP service.
That would be a bit more work on the border router. Does this really buy us anything? In the default case the network is going to be more compact anyway, right?
That would be a bit more work on the border router. Does this really buy us anything? In the default case the network is going to be more compact anyway, right?
I agree it shifts the complexity from SRP clients to the border router.
I can also see the value of maintaining the flexibility of specifying the IPv6 address and port.
So we can support two separate SRP service types to encode anycast and unicast. If both appear in the network data for whatever reason, can we have the SRP client always prefer one? It'd be simpler for the SRP client to only have to deal with maintaining registrations with a single SRP server.
If we can signal the availability of anycast without advertising it as a service that might be better. The way my client works is to try successive servers if it doesn't get through to the first one. Anycast makes this impossible, because the client has no way to choose a specific server. I think that if anycast is present, it makes sense to just register with the anycast server; BRs shouldn't advertise anycast if there's an infrastructure SRP server. We'd need to specify how that's configured, of course.
If we can signal the availability of anycast without advertising it as a service that might be better.
I'm not sure I understand the distinction between "anycast" and "service". Is the goal generalize the anycast address to support more than just SRP?
I believe just signaling the availability of anycast in Thread requires the Service TLV mechanism. The Thread network data needs to include identifiers (RLOCs) for each of the border routers that are announcing the availability of anycast - it's how Thread routers know where they can route anycast packets.
I think the best path forward is to just have two different Service TLVs. The "Service TLV" is just a name for a mechanism. The SRP client can then prefer anycast if present.
Anycast TLV:
SRP Infrastructure Service TLV:
Both would include the same Server TLVs that just encode the RLOC16s.
The two service TLV definitions look great to me.
I think in both ServiceTLV
definitions in between 4 and 5 we need one more byte for a "service number" (as part of the Service TLV data blob). So:
The Anycast model Service TLV Data is 2 bytes:
The SRP infra Service TLV Data is 19 bytes:
I think in both
ServiceTLV
definitions in between 4 and 5 we need one more byte for a "service number" (as part of the Service TLV data blob).
Yes, thanks for catching that. I updated my comment above.
Right. So with the IP address service, we have the enterprise number (44970), one byte of service data (0x5d), and 18 bytes of server data, with the IP address and port number in network byte order.
Sounds good. 44970 is Thread Enterprise number (which is encoded as compressed in Service TLV). 0x5d
would be the service number (for The SRP infra Service TLV model).
In Jonathan's suggestion from https://github.com/openthread/openthread/issues/6447#issuecomment-819952599, the IP and port are part of the Service TLV data and not Server data.
I think both cases (including it in Service or Sever data) can be useful. If there is a common infra SRP server and multiple BRs want to add it in network data it would be good to add the info in Service Data (otherwise we can end up with the same address being encoded in network data multiple times).
One idea is to make it flexible, i.e. when parsing network-data for such entries, we can allow/accept address and port info to be encoded in either Service Data or Server Data. I think this will be relatively straight-forward to implement and with this we can be compatible with the current model used by Apple implementation/devices while supporting future models.
Thoughts?
I'm not sure how big a problem this is, but with this approach, existing thread accessories that interoperate with HomePod will not successfully discover the service if it's in the service data. That said, we definitely want this efficiency, and this is probably the best way to get it.
Submitted PR https://github.com/openthread/openthread/pull/6501 which implements what we discussed above.
Adding some quick notes on this:
Resolved by #6501
The SRP client uses an IP address that isn't configured on the SRP server.
To Reproduce
Git commit id: d8a2cd9a70cfda7c2657645b4a57411eadaa4e61 Border router: Apple's implementation of Duckhorn router running on Raspi pi 4 Model B Accessory: running on nordic nrf52840-DK
Expected behavior We expect the SRP client to use the address advertised in the network data
Console/log output
Here are the addresses configured on the Thread interface on the BR:
Additional context Add any other context about the problem here.
The code here seems to be constructing an address in a way that's not valid and doesn't actually produce an address that's configured on the Border Router (SRP server):
https://github.com/openthread/openthread/blob/main/src/core/thread/network_data_service.cpp#L163
As you can see in the console output, the address is constructs is this:
fdf2:7a68:5f3e:108c:0:ff:fe00:fc11
This is a mesh-local address, which isn't in the advertisement, and is not actually an address configured on the BR.