vectorgrp / sil-kit

Vector SIL Kit – Open-Source Library for Connecting Software-in-the-Loop Environments
https://vectorgrp.github.io/sil-kit-docs
MIT License
107 stars 32 forks source link

Problems with the TcpNoDelay option with two participants in WSL #149

Open MarcoJassy opened 2 days ago

MarcoJassy commented 2 days ago

Hello, I don't seem to be able to connect two participants in WSL with the TcpNoDelay option set to true. I receive the messages bellow every time I connect the second participant.

[2024-11-19 15:06:15.087] [wsl_can_md] [info] Creating participant 'wsl_can_md' at 'silkit://0.0.0.0:8501', SIL Kit version: 4.0.42 [2024-11-19 15:06:15.087] [wsl_can_md] [info] The provided registry URI 'silkit://localhost:8501' differs from the configured registry URI 'silkit://0.0.0.0:8501'. The latter will be used. [2024-11-19 15:06:15.088] [wsl_can_md] [info] Connected to registry at 'local:///tmp/SilKitRegibb27dfeaa022c81c.silkit' via 'local://' (local:///tmp/SilKitRegibb27dfeaa022c81c.silkit, silkit://0.0.0.0:8501) > [2024-11-19 15:06:15.088] [wsl_can_md] [warning] SetAsioSocketOptions: failed to enable 'no delay' option [2024-11-19 15:06:15.091] [wsl_can_md] [error] SilKit-IOWorker: Something went wrong: basic_string::_M_create [2024-11-19 15:06:20.091] [wsl_can_md] [error] Timeout while waiting for replies from known participants: wsl_can_dn (tcp://127.0.0.1:8502) is waiting for reply Something went wrong: SIL Kit: An unspecified error occured. (1): Timeout while waiting for replies from known participants: wsl_can_dn (tcp://127.0.0.1:8502) is waiting for reply Press enter to stop the process...

I am using windows 10 with WSL2 and Ubuntu22.04.

Here is the configuration yaml of the participant:

schemaVersion: 1
Description: CANoe Configuration

Middleware:
  RegistryUri: silkit://0.0.0.0:8501
  ConnectAttempts: 1
  TcpNoDelay: true
  AcceptorUris: [tcp://0.0.0.0:8504]
  RegistryAsFallbackProxy: true

Logging:
  Sinks:
    - Type: Stdout
      Level: Info

Thanks in advance.

MariusBgm commented 1 day ago

Hi @MarcoJassy, the participant's configuration states taht the registry's URI is silkit://0.0.0.0:8501. That can't be right, that is the any IP address for listening sockets. You should put the IP address of the host running the registry there.

With WSL there's some subtle port forwarding and magic networking involved. You could try starting your registry on the WSL side:

sil-kit-registry --listen-uri silkit://0.0.0.0:8501

The use of the any-address is correct here. Then you need to point your Windows participant to the localhost on port 8501:

Middleware:
  RegistryUri: silkit://localhost:8501

Let me know if that helps.

MarcoJassy commented 9 hours ago

Hello @MariusBgm, Thank you for your reply. I forgot to mention that on this test all participant including the registry are in the same WSL. I tried to change the registry URI of the participants to localhost:8501 but than I received the same 'no delay' error already when I connected the first participant. I tried to change the ListenUri of the registry from 0.0.0.0:8501 to localhost:8501 and the result was the same as the first test. The 'no delay' error appeared when I tried to connect the second participant.

Here is the yaml file of the registry

SchemaVersion: 1

ListenUri: silkit://localhost:8501

Logging:
  Sinks:
    - Type: Stdout
      Level: Info # Trace | Debug | Warn | Info | Error | Critical | Off

The new yaml of the participant 1

schemaVersion: 1
Description: CANoe Configuration

Middleware:
  RegistryUri: silkit://localhost:8501
  ConnectAttempts: 1
  TcpNoDelay: true
  AcceptorUris: [tcp://0.0.0.0:8502]
  RegistryAsFallbackProxy: true

Logging:
  Sinks:
    - Type: Stdout
      Level: Debug

The participant 2

schemaVersion: 1
Description: CANoe Configuration

Middleware:
  RegistryUri: silkit://localhost:8501
  ConnectAttempts: 1
  TcpNoDelay: true
  AcceptorUris: [tcp://0.0.0.0:8504]
  RegistryAsFallbackProxy: true

Logging:
  Sinks:
    - Type: Stdout
      Level: Debug
MariusBgm commented 8 hours ago

@MarcoJassy i cannot reproduce the nodelay errors, with your configs. Could you provide some more infos:

for what it's worth, I'm on a WSL version 2 running an Ubuntu 20.04 image.

MarcoJassy commented 8 hours ago

Well that is strange @MariusBgm. I am using windows 10 with WSL2 and Ubuntu 22.04. My SIL Kit version is 4.0.42.

MariusBgm commented 8 hours ago

Well that is strange @MariusBgm. I am using windows 10 with WSL2 and Ubuntu 22.04. My SIL Kit version is 4.0.42.

@MarcoJassy I just attempted with SIL Kit Version 4.0.42, and it also works. I tested using the SilKit Ethernet Demo. Did you use one of the SIL Kit Demos? If not, could you test with a bundled SIL Kit Demo with your configurations and report back? Or if you have custom code, are your participants using boost::asio themselves in their code?

MarcoJassy commented 2 hours ago

@MariusBgm, I started the registry of my test and than run SilKitDemoEthernet with the configuration yaml of my test as an EthernetWriter. The problem still showed up. Am I missing something in this setup?