mozilla / libdweb

Extension containing an experimental libdweb APIs
MIT License
441 stars 32 forks source link

TCPSocket server.address? #64

Closed alanshaw closed 5 years ago

alanshaw commented 6 years ago

UDPSocket has server.address can TCPSocket also?

I need to know what host I'm listening on so that I can advertise it over MDNS for other peers to connect to me.

Gozala commented 6 years ago

Hi @alanshaw the reason address is not exposed is because underlying implementation that is based of TCPServerSocket.webidl does not exposes the address. Right now we actually workaround bug in that and use nsIServerSocket.idl which also does not seem to expose it (to JS). It is exposed for native though, and has following comment:

https://github.com/mozilla/gecko-dev/blob/796058f4eb4185333278c2ef789c52466d26c0c4/netwerk/base/nsIServerSocket.idl#L203-L212

/**
  * Returns the address to which this server socket is bound.  Since a
  * server socket may be bound to multiple network devices, this address
  * may not necessarily be specific to a single network device.  In the
  * case of an IP socket, the IP address field would be zerod out to
  * indicate a server socket bound to all network devices.  Therefore,
  * this method cannot be used to determine the IP address of the local
  * system.  See nsIDNSService::myHostName if this is what you need.
  */

Which suggest me that maybe what you actually looking for is nsIDNSService.myHostName which could be a separate API.

Does that make sense or am I missing something ?

Gozala commented 6 years ago

@alanshaw Also ServiceDiscovery API actually does not need to be given an address for advertising it will advertise your michaine's addresses itself.

alanshaw commented 6 years ago

Does that make sense or am I missing something ?

That all makes total sense :D

So, right now when I listen on a port using TCPSocket it binds to all available network addresses - right?

Being able to find out the available network addresses sounds very useful. I actually really need this for the IPFS libdweb transport module I'm working on.

In IPFS we split out discovery and transport and you can have multiple discovery and transport methods. Each transport needs to be able to provide the address(es) it's listening on to other discovery modules so that it can be communicated to other peers.

So while my IPFS libdweb discovery module will advertise these addresses, we might have a different discovery module that doesn't, so the transport needs to be able to provide them.

Less important, but couldTCPSocket allow me to chose which address to listen on (which could be all of them - 0.0.0.0)?

Gozala commented 6 years ago

That all makes total sense :D 👍

Being able to find out the available network addresses sounds very useful. I actually really need this for the IPFS libdweb transport module I'm working on.

Can you please provide more context please. I took a look at the source but it's very difficult for me to gather without context.

Note: I would like to limit API surface we expose and if we expose something we should have really good reason to do so, and one that I can justify during security review. A failure to do that would mean we'll have to take it out which would be more painful than not having this in first place. Given that all the existing Gecko internal APIs do not expose socket address makes me think there must have being reasons to do so.

I would encourage trying to frame the goal instead of specific request, in my experience it's helps eliminate disagreement on specific points and rather put participants on same team trying to solve problem.

In IPFS we split out discovery and transport and you can have multiple discovery and transport methods. Each transport needs to be able to provide the address(es) it's listening on to other discovery modules so that it can be communicated to other peers.

What are the other relevant discovery methods in the context of WebExtension ?

So while my IPFS libdweb discovery module will advertise these addresses, we might have a different discovery module that doesn't, so the transport needs to be able to provide them.

BTW you can get addresses from announced services as well.

Less important, but couldTCPSocket allow me to chose which address to listen on (which could be all of them - 0.0.0.0)?

Underlying implementation in Gecko does not provide a way to specify an address, that being said comment linked in previous comment says:

In the case of an IP socket, the IP address field would be zerod out to indicate a server socket bound to all network devices.

Which unless I misunderstand suggests that default is binding all of them.

Gozala commented 6 years ago

@alanshaw To be more clear, I'm not against any of the requests stated. I'm also happy to followup with a Gecko networking team with a request to expose:

But I need to have a clear justifiable use cases supporting my request, otherwise it will just discredit this effort and make further requests more difficult. So please help me define the use cases so it's easier to have this conversation.

alanshaw commented 6 years ago

Our goal is to allow direct p2p communication over a TCP socket, but how can one browser initiate a conversation with another browser if it does not know the address of the other?

The Service Discovery API can be used to advertise the addresses of the browser that is providing a TCP socket service. So, provided the TCP Socket API and the Service Discovery API are used together, the other browser can discover the address and connect directly to it.

In other words, right now the two APIs have to be coupled and mandate discovery and communication over a local network.

Discovery over the wider internet is not possible. MDNS cannot be used to announce our address across networks so to advertise our service outside of a local network we'd need to know what address(es) our service is bound to.

In the context of a p2p node running in a web extension, I can think of two cases when we might want to do this, but there may be many more:

  1. If our browser was connected directly to the wider internet, we might want to inform a known rendezvous server of the address.
  2. If our browser knew of a relay node that was connected to the wider internet we might want to register a circuit relay address with the node to allow it to forward traffic for us. We'd need to know the addresses we're listening on to be able to do this.

It's arguable that these cases are workarounds for true p2p communications so are perhaps outside the scope of the libdweb library. I also don't think these are particularly common use cases right now but (1) might become more commonplace due to IPv6.

The only other reason I can think of for including an API to access the network address(es) our TCP socket is listening on is that someone could simply use the Service Discovery API to announce a fake service, discover themselves and in doing so discover their network interface addresses. i.e. they can already get the information by other means.

Is there a significant security gain to be had from not including an API to get the network interface addresses if you're able to self discover anyway?


Right now I can work around not having an API for this by using the self discovery method...it's worth noting that the workaround might not work - corporate network policy might forbid that type of traffic.

So in light of the workaround, maybe this API is for libdweb v2 and we can park it for now?

Part of the drive for this is due to integrating libdweb with existing defined interfaces for IPFS discovery and transport modules where the contract requires transport modules to provide the full address of where the service is listening so that it can be advertised by the discovery modules.

There's a category of people who will use libdweb who will need to integrate with existing systems like I am - I believe that if I've come across this there will be others who also need it.

RangerMauve commented 6 years ago

I think that discovering your IP address will require something similar to STUN, just exposing the IP address of your TCP server won't be enough. Plus, computers could have multiple network interfaces.

Gozala commented 6 years ago

Disclaimer: Not arguing just trying to understand if I'm missing something, as I want have legitimate case to make to networking team that I can myself elaborate.

Our goal is to allow direct p2p communication over a TCP socket, but how can one browser initiate a conversation with another browser if it does not know the address of the other?

The Service Discovery API can be used to advertise the addresses of the browser that is providing a TCP socket service. So, provided the TCP Socket API and the Service Discovery API are used together, the other browser can discover the address and connect directly to it.

In other words, right now the two APIs have to be coupled and mandate discovery and communication over a local network.

From what I've understood p2p protocols across the board do local discovery for connecting peers on local network and remote discovery for connecting with peers on remote network.

In case of local discovery I've seen protocols use either:

  1. Custom UDP broadcasting, in which case only port info is necessary as receiver can see address(es) of the broadcasting socket and there for connect to the broadcasting host on the advertised port. (I believe that is what SSB does)
  2. DNS-SD which is essentially the same as above except message exchange protocol has being specd out and standardized (_this is what libdweb exposes, and go IPFS uses, Dat also uses this but in spec incompatible manner).

So unless there is different discovery method on local network, if there is please make sure to point out, I find argument of coupling pretty weak. Counter argument could be that advertising an address(es) adds redundant data to the payload.

I wonder what happens if say my laptop switches to different network while TCP port is listening on some port. Will socket get closed, or is address going to change ?

Discovery over the wider internet is not possible. MDNS cannot be used to announce our address across networks so to advertise our service outside of a local network we'd need to know what address(es) our service is bound to.

As of remote discovery I do not know if knowing local address buys you anything. As I understand it you need to do Hole_punching, which echos what @RangerMauve said. I believe WebRTC uses STUN / ICE to accomplish exactly that.

Only case where that is not true is when node is running on the server and I it's probably fare to assume that using Firefox is probably a not right tool.

In the context of a p2p node running in a web extension, I can think of two cases when we might want to do this, but there may be many more:

  • If our browser was connected directly to the wider internet, we might want to inform a known rendezvous server of the address.

I believe in such scenario still requires hole punching, rendezvous server will need to figure out address of the node from the initial connection to it. There for including internal network addresses would not be very useful.

  • If our browser knew of a relay node that was connected to the wider internet we might want to register a circuit relay address with the node to allow it to forward traffic for us. We'd need to know the addresses we're listening on to be able to do this.

I think same applies here as well, relay node will need need to detect circuits address as internal network address would likely be mapped.

It's arguable that these cases are workarounds for true p2p communications so are perhaps outside the scope of the libdweb library.

I don't necessarily think those use cases are out of scope, but I also fail to see how exposing address(es) here will enable those use cases, maybe except few cases where node isn't behind the NAT, but that is so rare that I fear building for that assumption will do more harm than good.

Am I missing something ?

I also don't think these are particularly common use cases right now but (1) might become more commonplace due to IPv6.

Does IPv6 makes it possible to avoid hole punching ? If that is the case that might provide a good reason to make a case for providing IPv6 info for the socket if that network interface is available.

The only other reason I can think of for including an API to access the network address(es) our TCP socket is listening on is that someone could simply use the Service Discovery API to announce a fake service, discover themselves and in doing so discover their network interface addresses. i.e. they can already get the information by other means.

It very well could be that after security review we would have to hide that info as well, and we may have to make address opaque object that you could connect to but not necessarily tell what the address is.

Is there a significant security gain to be had from not including an API to get the network interface addresses if you're able to self discover anyway?

I do not know, but in my experience being conservative is a winner. During security reviews I always get overwhelmed by number of threats I overlook. It could be that this is different, but I absolutely need a solid argument why we need this, why not is usually best way to loos argument.

Right now I can work around not having an API for this by using the self discovery method...it's worth noting that the workaround might not work - corporate network policy might forbid that type of traffic.

So in light of the workaround, maybe this API is for libdweb v2 and we can park it for now?

I am still not sure why do you need to workaround it. As I understand IPFS uses dns-sd for local discovery (go implementation and js one is non standard version of it but hopefully it can be addressed) why not just use address(es) from discovery ?

Part of the drive for this is due to integrating libdweb with existing defined interfaces for IPFS discovery and transport modules where the contract requires transport modules to provide the full address of where the service is listening so that it can be advertised by the discovery modules.

Unless there is a reason that is overlooked, it seems to me that existing contract requirements were not taking use case that libdweb outlined into account and maybe it's good reason to consider if those requirements need updating.

This also makes me wonder how does other transports like WebRTC or WebSocket deal with this, it seems like they'd face same limitation, no ?

There's a category of people who will use libdweb who will need to integrate with existing systems like I am - I believe that if I've come across this there will be others who also need it.

I understand that diverging from API exposed by nodejs would make integration with existing systems more problematic, but then again I am almost certain that counter argument I'll hear will be: nodejs is designed to operate in privileged context and has a very different security constraints than browser does.

@alanshaw What does IPFS do with an address from the discovery ? Does it just discard it ?

Here is an idea that might make integration with existing IPFS requirements without changing them that would avoid unreliable self discovery.

What if libdweb based transport announced address as :¯\_(ツ)_/¯: and on receiver of announcement derived actual address from the dns-sd replacing :¯\_(ツ)_/¯: with actual address.

Gozala commented 6 years ago

I had a brief conversation with an author of the TCPServerSocket implementation in gecko and was told that leaving out address was not intentional and from that conversation it appears to me that patch to add it will be accepted (as long as it meets completeness criteria).

There for I have submitted Bug 1485086 to tackle this.

@alanshaw please note that even if we do expose address, chances are we might need to take it out later on due to security concerns that might be raised, so I would still encourage to revisit IPFS transport requirements.

Gozala commented 6 years ago

In an effort to make this a truly community effort I would like to invite community to lead this effort. It's likely going to be multiple leg process, here is an overview that could be a good starting point:

In theory libdweb wraps existing gecko APIs defined by following interfaces:

In practice though only client socket does, as discoverd Bug 1474150 made server one fairly unreliable. As a workaround current implementation uses polyfills TCPServerSocket API via nsIServerSocket XPCOM that pretty much follows native TCPServerSocket implementation in gecko

So in order to expose address on the server socket one would just need to follow instruction conveniently provided for nsIServerSocket.getAddress native only (implies not callable from JS) method.

So tackling this would involve multiple steps:

  1. Contributor would need to patch polyfills TCPServerSocket and add address field by following instruction suggested at nsIServerSocket.getAddress.

  2. Take a Bug 1485086 and do the same thing for native TCPServerSocket implementation in gecko.

Note: I would not want to take just 1. without someone tackling 2. as we'll get rid of polyfill as soon as Bug 1474150 resolved, which consequently would break this API once again.

I am happy to provide all the necessary guidance.

Gozala commented 6 years ago

Additional detail to consider (my comment from bug 1485086

Comment under nsIServerSocket.getAddress suggests following:

Returns the address to which this server socket is bound. Since a server socket may be bound to multiple network devices, this address may not necessarily be specific to a single network device. In the case of an IP socket, the IP address field would be zeroed out to indicate a server socket bound to all network devices. Therefore, this method cannot be used to determine the IP address of the local system. See [nsIDNSService.myHostName()](https://developer.mozilla.org/en-US/docs/XPCOM_Interface_Reference/nsIDNSService#myHostName()) if this is what you need.

Am I wrong at interpreting that as socket will bind to all available network interfaces ? And if so would not nsINetworkInfoService.listNetworkAddresses be better choice than suggested nsIDNSService.myHostName ?

Gozala commented 5 years ago

Migrated to https://bugzilla.mozilla.org/show_bug.cgi?id=1485086