heroiclabs / nakama-unity

Unity client for Nakama server.
https://heroiclabs.com/docs/unity-client-guide
Other
407 stars 75 forks source link

[Regression] [Bug] "The requested address is not valid in its context." after updating nakama-unity beyond 3.2 #132

Closed MWFIAE closed 2 years ago

MWFIAE commented 2 years ago

After updating nakama-unity to a 3.4.1 we got the following error:

SocketException: The requested address is not valid in its context.

System.Net.Sockets.SocketAsyncResult.CheckIfThrowDelayedException () (at <6d7c4c8dd3624dc596686fb7270ae1e6>:0)
System.Net.Sockets.Socket.EndConnect (System.IAsyncResult asyncResult) (at <6d7c4c8dd3624dc596686fb7270ae1e6>:0)
System.Net.Sockets.TcpClient.EndConnect (System.IAsyncResult asyncResult) (at <6d7c4c8dd3624dc596686fb7270ae1e6>:0)
System.Threading.Tasks.TaskFactory`1[TResult].FromAsyncCoreLogic (System.IAsyncResult iar, System.Func`2[T,TResult] endFunction, System.Action`1[T] endAction, System.Threading.Tasks.Task`1[TResult] promise, System.Boolean requiresSynchronization) (at <6073cf49ed704e958b8a66d540dea948>:0)
--- End of stack trace from previous location where exception was thrown ---
Nakama.Ninja.WebSockets.WebSocketClientFactory.GetStream (System.Guid loggingGuid, System.Boolean isSecure, System.Boolean noDelay, System.String host, System.Int32 port, System.Threading.CancellationToken cancellationToken) (at <c17a258af24e4d289a30070aa95dc0df>:0)
Nakama.Ninja.WebSockets.WebSocketClientFactory.ConnectAsync (System.Uri uri, Nakama.Ninja.WebSockets.WebSocketClientOptions options, System.Threading.CancellationToken token) (at <c17a258af24e4d289a30070aa95dc0df>:0)
Nakama.WebSocketAdapter.ConnectAsync (System.Uri uri, System.Int32 timeout) (at <c17a258af24e4d289a30070aa95dc0df>:0)

It doesn't occur in version 3.2 and below. It also works fine if testing on localhost. But not once it is deployed to our staging servers.

I think it is caused by our setup, as we are using nginx to route the websocket connection to the right nakama instance according to the hostname used in the request.

lugehorsam commented 2 years ago

We cannot debug interactions between your infrastructure and our SDK. We recommend using Heroic Cloud if you aren't able to resolve the issue.

MWFIAE commented 2 years ago

@lugehorsam @novabyte I think this will be a very common issue in all kind of self-hosted environments. As an open-source project I would expect nakama to show us a clear path on how self hosting should be done.

We use nginx mainly to handle ssl and to not having to expose the nakama port. We do this as configuring ssl directly with nakama is not recommended for production use. And of course not using ssl at all is even less ready for production use.

Since our setup worked perfectly before the update (and does continue to do so after downgrading again) something has changed which makes nakama less compatible than before and this needs to be addressed in some way or another.

mofirouz commented 2 years ago

As an open-source project I would expect nakama to show us a clear path on how self hosting should be done.

We've done so in the form of Docker-compose to get things up and running for local development and self-hosting. How you configure your infrastructure past that goes beyond what we can provide as we cannot possibly direct and show every single permutation of setups (Nginx vs Envoy vs cloud provider LBs vs something else, and that's just Loadbalancers).

I think this will be a very common issue in all kind of self-hosted environments.

We have many thousands of developers using the Unity SDK in self-hosted environments and there hasn't been any complaints from anyone else apart from this case. I believe your issue is isolated to your installation.

Whilst we are on this topic, Heroic Labs provides Heroic Cloud which is what we recommend for production environments - and we have many customers that are using the latest version of the Unity SDK without any issues on our provided production environments.

Heroic Cloud is the way we provide funds to support Nakama and further development at Heroic Labs.

Since our setup worked perfectly before the update (and does continue to do so after downgrading again) something has changed which makes nakama less compatible than before and this needs to be addressed in some way or another.

Can you point to a commit that you think (or know) has caused this issue in the Unity SDK? More than happy to help if you can point out where you think the issue is.

Any and all pull requests and contributions are welcomed.

MWFIAE commented 2 years ago

How you configure your infrastructure past that goes beyond what we can provide as we cannot possibly direct and show every single permutation of setups (Nginx vs Envoy vs cloud provider LBs vs something else, and that's just Loadbalancers).

Having at least one of those configurations as a guideline would be extremely helpful 🙏

Unfortunately I haven't found the original tutorial I followed back then, but our current nginx configuration is similar to what is shown there. So if you can spot something wrong with that it would be a great help :)

Can you point to a commit that you think (or know) has caused this issue in the Unity SDK? More than happy to help if you can point out where you think the issue is.

It started with v3.3 so if we take a look at the differences between v3.2 and v3.3 it would only be 4 commits:

mofirouz commented 2 years ago

It started with v3.3 so if we take a look at the differences between v3.2 and v3.3 it would only be 4 commits: ... https://github.com/heroiclabs/nakama-unity/commit/c9877017a0a4cb3f4a4dfd37da80dfb41b146032 (this has to be it)

Thanks, I had a look and I can't see where a regression could be caused in that commit. Since we can't reproduce it, could you kindly bisect the commits or show us a minimal setup (both infrastructure, and a minimal Unity project) that we can use to both replicate - please pin-point the issue (file/line number) as that would be super helpful?

MWFIAE commented 2 years ago

Thanks, I had a look and I can't see where a regression could be caused in that commit.

Pretty sure it would be in the updated nakama.dll Maybe testing out where the regression is caused in nakama-dotnet would therefore be the better move.

I will try to create a minimum example next week, but can't guarantee anything as the closed alpha release of our game is in 3 days and times are busy as you can imagine 😄