litex-hub / wishbone-utils

Utilities for working with a Wishbone bus in an embedded device
Apache License 2.0
41 stars 12 forks source link

Connecting over ethernet causes infinite loop #24

Open GuzTech opened 4 years ago

GuzTech commented 4 years ago

I have been testing Litex on the Colorlight 5A-75B board and I have connected to it with wishbone-tool over Etherbone several times.

After some modifications to the SoC, I noticed that I couldn't connect to it anymore. Then I saw that it has nothing to do with the board, as the wishbool-tool gives me this in a loop that I cannot CTRL-C out of:

ERROR [wishbone_tool::bridge::ethernet] ethernet connection was closed: peek IoError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) @ 82001820
INFO [wishbone_tool::bridge::ethernet] Re-opened ethernet host 192.168.1.50:1234

This happens all the time, even if I disable all network ports. I invoke it like this:

wishbone-tool --ethernet-host 192.168.1.50 --server terminal --csr-csv=csr.csv

This is with the latest commit, and compiled with Rust version 1.42.

mithro commented 4 years ago

@xobs, @enjoy-digital - Thoughts?

xobs commented 4 years ago

Can you run it with RUST_LOG=debug? Do you have a CPU configured?

xobs commented 4 years ago

The socket as a 1000 ms (one second) timeout, and if that timeout expires (and you're on a Unix-like) it will generate EAGAIN, or "Resource temporarily unavailable": https://doc.rust-lang.org/std/net/struct.TcpStream.html#platform-specific-behavior-1

With RUST_LOG=debug we can see more of what's going on.

Also, if you're connecting directly to the board (as opposed to going through litex_server, make sure you DO NOT add --ethernet-tcp to the command line.

enjoy-digital commented 4 years ago

@GuzTech: The colorlight target currently has timing issues (https://github.com/litex-hub/litex-boards/issues/40). Despite that the target in litex-boards is working correctly, but before trying wishbone-tool i would recommend trying to ping it manually. This would validate that the hardware IP/UDP stack is behaving correctly and that wishbone-tool can operate. If you are not able to ping it, it's more a gateware/timing issue than a wishbone-tool issue and i have a look at that if you share a design that allows reproducing the issue.

GuzTech commented 4 years ago

@enjoy-digital Yesterday I was trying the colorlight target and everything worked except for the SDRAM. Then I checked the nextpnr log and saw that both the system clock and ethernet clock (125 MHz) fails like you said. I asked on the 1BitSquared discord and @daveshah1 suggested that I could try to lower the system clock to 40 MHz which passes timing (ethernet still doesn't but is close to 125 MHz). After this, everything stopped working.

So as you suggested, I tried to ping the board and it fails. So I re-synthesized with the system clock set back to 125 MHz and now I can ping the board and wishbone-tool also works. A design with negative slack is playing Russian roulette, but of course this has nothing to do with the wishbone-tool issue.

@xobs Here is the output when I run it with RUST_LOG=debug:

ERROR [wishbone_tool::bridge::ethernet] ethernet connection was closed: peek IoError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) @ 82001820
DEBUG [wishbone_tool::bridge] Peek failed, trying again: IoError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" })
INFO [wishbone_tool::bridge::ethernet] Re-opened ethernet host 192.168.1.50:1234
DEBUG [wishbone_tool::bridge] Peek failed, trying again: NotConnected

I'm directly connecting to the board and the command I invoke (in the OP) does not use --ethernet-tcp. So the problem seems that whenever it tries to connect to an unavailable host, it causes this problem and is unrelated to the FPGA board.

xobs commented 4 years ago

That's kind of the design, but I agree it's not clear that's what's going on.

wishbone-tool doesn't know if your board has crashed, isn't connected, or isn't programmed. It's designed to let you run the command and it will wait for you to connect the device, at least at the PHY layer. It does this so that, for example, you can connect GDB to the board and it will stay connected even if you reflash the FPGA.

GuzTech commented 4 years ago

Sure, that makes sense. But why am I not able to CTRL-C out of it? I have to kill the process if I want out.

xobs commented 4 years ago

What was the command you used to run wishbone-tool?

GuzTech commented 4 years ago
wishbone-tool --ethernet-host 192.168.1.50 --server terminal --csr-csv=csr.csv

I have also tried litex-devmem2, and I can connect to the board propertly. When I specify and invalid target address I can CTRL-C out of it.

xobs commented 4 years ago

In a separate channel, we determined that the board in question is failing to meet timing by more than 3x (requested: 125 MHz, actual: 41.51 MHz). The link is unstable, so it is spending a lot of time retrying the connection.

The terminal "server" is managed in a function called "terminal_client()". This server attempts to read from the serial port IRQ status register, and if that fails then it polls the console. However, due to how wishbone-tool aggressively tries to re-establish the connection, it actually gets stuck in https://github.com/litex-hub/wishbone-utils/blob/master/wishbone-tool/src/server/mod.rs#L457-L466 waiting for a response.

Furthermore, the terminal server takes over the console, preventing you from e.g. sending "Control-C". This keystroke combination is only checked further down in https://github.com/litex-hub/wishbone-utils/blob/master/wishbone-tool/src/server/mod.rs#L483-L486

So if:

  1. You're using the terminal server, and
  2. you're connecting via Etherbone, and
  3. It's direct using UDP, and
  4. The board is not providing reliable communication,

then it will get stuck waiting for the board to respond and never check for Control-C.

enjoy-digital commented 4 years ago

@GuzTech: since the LiteEth core is currently running in sys_clk domain, sys_clk needs to be >= 125MHz for the IP/UDP MAC to work, that's the reason it's actually set to 125MHz in the target file. I'm planning to work on this, but don't have the solution for now to have something functional that also meets timings.

xobs commented 3 years ago

This is likely related to #33 and was fixed in 41f8c81c05a34053b127313d15cb075814fafeae

Can you try v0.7.8 and see if it solves the issue?