savi-lang / savi

A fast language for programmers who are passionate about their craft.
BSD 3-Clause "New" or "Revised" License
155 stars 12 forks source link

TCP hangs on FreeBSD #367

Closed mneumann closed 1 year ago

mneumann commented 1 year ago

I've seen this for savi-Postgres and now as well on savi-MQTT.

After sending a couple of packets, this can be 2, 50 or 100, it just hangs.

Also, sometimes when running savi spec, the test suite hangs. Likely that the two issues are related.

The process is in Idle state.

To reproduce:

https://github.com/mneumann/savi-MQTT/

savi run example

You need a MQTT broker (mosquitto) running on localhost. (you can see that it is publishing by running mosquitto_sub -t topic

mneumann commented 1 year ago

When I run with --ponymaxthread 1 it does not hang. Am I using the wrong ponyrt version or libsavi_rt?

mneumann commented 1 year ago
> ./bin/example --ponyversion
 [release]
Compiled with: LLVM  -- Clang-9.0.1-
mneumann commented 1 year ago

I am trying to build libponyrt.bc tomorrow myself.

jemc commented 1 year ago

@mneumann - This may potentially be the same issue as https://github.com/savi-lang/TCP/issues/2

At any rate, I'm going to move the issue to the TCP repo for consistency - this is unlikely to be a bug in the compiler or the core library.

jemc commented 1 year ago

What gives me pause and makes me think this may be a different issue than savi-lang/TCP#2 is that savi-lang/TCP#2 happens on many file descriptors and gives an infinite loop.

jemc commented 1 year ago

I am trying to build libponyrt.bc tomorrow myself.

I don't think this will be fruitful - it's likely that the Pony runtime version doesn't matter here.

What would be more helpful is some further info about how it hangs.

I'd suggest either using lldb or adding some Inspect.out statements to the TCP library here for some extra debugging to see if you can figure out what happens just before it hangs.

mneumann commented 1 year ago

It seems, it's unrelated to TCP. It also hangs when I just use a Timer. So I suspect it's the underlying event system.

So basically this hangs:

:actor Main
  :is Timer.Actor
  :let timer Timer.Engine

  :new (env)
    @timer = Timer.Engine.new(@, Time.Duration.milliseconds(1))

  :fun ref timer_react @
    elapsed = @timer.elapsed
    Inspect.out("TICK: \(elapsed.total_seconds.format.decimal).\(elapsed.nanosecond.format.decimal)")
    @
mneumann commented 1 year ago

Note that the timer example above does not seem to hang when I pass --ponynoblock. It consistently hangs without it. The full TCP example blocks regardless of --ponynoblock.

jemc commented 1 year ago

Haha okay I can move this issue back to the main repo then.

Do you get similar hangs with an equivalent Pony program?

mneumann commented 1 year ago

Interestingly, with the today released savi, it hangs much much less. Dunno if you also updated the runtime to a more recent one. But I can still get it to hang, so it's much less reproducable.

With the most recent pony runtime, I could not reproduce the hang yet. I am still trying, but it looks good.

mneumann commented 1 year ago

With todays release it hangs reproducable on every run!

mneumann commented 1 year ago

With the most recent ponyrt and the changes to savi (https://github.com/savi-lang/savi/pull/369), it's running like a Volkswagen. No hangs. I'll let it run in the background for a few hours (before it would reproducable hang after max 10 seconds).

mneumann commented 1 year ago

@jemc after 30 minutes and million of MQTT messages later, still no hang :) I am 99.9999% confident that the hang is gone with the new runtime.

jemc commented 1 year ago

It could also be something bad about the way we're building the runtime for FreeBSD in CI.

It's easy enough to find out. I'll update our runtime build to the latest released Pony version.

jemc commented 1 year ago

@mneumann - when you say "most recent ponyrt", do you mean the latest commit to the main branch there, or do you mean the latest release (0.51.2)?

I kicked off a build for us of 0.51.2: https://github.com/savi-lang/runtime-bitcode/commit/511f78ab03f6339307e2c06e94a2760a48119e90

If you used the main branch for your tests, you might want to try building the 0.51.2 version locally and testing with that, because then if the new CI built-runtime doesn't work for you, we'd know if it was a difference with that ponyc commit or if it was a difference in the CI build vs your local build.

jemc commented 1 year ago

Meh, there's an issue with the build that I need to dig into and fix.

mneumann commented 1 year ago

@mneumann - when you say "most recent ponyrt", do you mean the latest commit to the main branch there, or do you mean the latest release (0.51.2)?

I kicked off a build for us of 0.51.2: savi-lang/runtime-bitcode@511f78a

If you used the main branch for your tests, you might want to try building the 0.51.2 version locally and testing with that, because then if the new CI built-runtime doesn't work for you, we'd know if it was a difference with that ponyc commit or if it was a difference in the CI build vs your local build.

I only tried the HEAD version of Pony, but I guess 0.51.2 would work as well.

Note, I have a few PRs at ponyc github that should fix the build. Maybe you're seeing the same issues that I had (google benchmark / googletest complains about unused variable).

jemc commented 1 year ago

Nah, it's a different issue - our runtime-bitcode CI automation does a much more small build - we only build the runtime bitcode. We don't build the pony-locked version of LLVM, the pony compiler or its tests. So it doesn't matter if the test build is broken - we can still build the bitcode.

I have a build that is now working CI and I'm gonna tag it...

jemc commented 1 year ago

The new runtime release is here: https://github.com/savi-lang/runtime-bitcode/releases/tag/v0.20220910.0

When it's done bundling I'll update the savi Makefile to grab the new runtime version and cut a new Savi release.

jemc commented 1 year ago

See PR https://github.com/savi-lang/savi/pull/371

jemc commented 1 year ago

@mneumann - is the issue resolved in the latest Savi release?

mneumann commented 1 year ago

No, it hangs! Is this based on the runtime of Pony 0.51.2?

mneumann commented 1 year ago

Compiled with latest release (hangs):

> ./savi/bin/savi -v
savi version: v0.20220910.1
llvm version: 14.0.3

> ./bin/example --ponyversion
 [release]
Compiled with: LLVM  -- Clang-9.0.1-

Compiled with what I have compiled myself yesterday (no hangs):

> savi -v
savi version: unknown
llvm version: 14.0.6

> ./bin/example --ponyversion
0.51.2-bdabb47b [release]
Compiled with: LLVM 14.0.3 -- Clang-13.0.0-
mneumann commented 1 year ago

I think Clang 9 is no good

jemc commented 1 year ago

No, it hangs! Is this based on the runtime of Pony 0.51.2?

Yes, it is the runtime of Pony 0.51.2 with no changes (except one build system patch that is not relevant to this).

jemc commented 1 year ago

@mneumann - If you think the Clang version is the issue (I'm skeptical, but I have no other ideas at the moment), do you have any ideas about how I should modify the FreeBSD CI setup to get Clang 13.0.0?

https://github.com/savi-lang/runtime-bitcode/blob/9c2bef47b823bd57a523975fefdd682262546380/.cirrus.yml#L45-L51

mneumann commented 1 year ago

@jemc where can I see the CI logs? There is not much that I do on my FreeBSD system. With the recent changes committed by SeanTAllen, it should just build out of the box (given pkg install -y cmake gmake libunwind git) with gmake libs && gmake configure runtime-bitcode=yes && gmake build.

clang 9.0 is installed as clang90, so I wonder why it picks it up.

jemc commented 1 year ago

@mneumann CI logs are visible on CirrusCI for the runtime-bitcode repo.

Here is the most recent FreeBSD build (the same one that you just tested and found to be hanging).

mneumann commented 1 year ago

@jemc Please remove llvm from pkg install. FreeBSD ships with a system compiler /usr/bin/clang which is 13.0.0. The llvm package installs LLVM 9 as /usr/local/bin/clang. Most likely you have a PATH like /usr/local/bin:/usr/bin, so it picks up the old version.

mneumann commented 1 year ago

llvm is a meta port that installs the llvm90 binaries without suffix (e.g. clang instead clang90)

jemc commented 1 year ago

Just tried - I can't remove llvm entirely because I need llvm-link (and FreeBSD doesn't seem to ship with it alongside its native clang).

So I'm gonna try using llvm13 instead of llvm.

mneumann commented 1 year ago

On Sun, Sep 11, 2022 at 01:40:14PM -0700, Joe Eli McIlvain wrote:

Just tried - I can't remove llvm entirely because I need llvm-link (and FreeBSD doesn't seem to ship with it alongside its native clang).

So I'm gonna try using llvm13 instead of llvm.

What is llvm-link used for?

Package llvm13 should give you clang13 and llvm-link13. Not sure if your build will automatically pick up llvm-link13 under this name.

-- Reply to this email directly or view it on GitHub: https://github.com/savi-lang/savi/issues/367#issuecomment-1243038797 You are receiving this because you were mentioned.

Message ID: @.***>

-- Michael Neumann NTECS Consulting www.ntecs.de

jemc commented 1 year ago

Yeah I had to symlink to get llvm-link in place as expected.

The build seems to work so I'll go ahead and publish it soon so you can try it.

jemc commented 1 year ago

@mneumann - can you try this new build?

https://github.com/savi-lang/runtime-bitcode/releases/tag/v0.20220911.0

jemc commented 1 year ago

What is llvm-link used for?

To build the runtime bitcode, clang builds many C files into many separate .bc files, then llvm-link links them all together into a single .bc file.

If you're using the Pony build system for the runtime, then it's build an entire LLVM tree for you when you run make libs and that's where it will get the llvm-link binary from.

But in our build steps for Savi's runtime builds, we are skipping the LLVM build steps and just get llvm-link from a system package.

mneumann commented 1 year ago

On Sun, Sep 11, 2022 at 02:33:28PM -0700, Joe Eli McIlvain wrote:

@mneumann - can you try this new build?

https://github.com/savi-lang/runtime-bitcode/releases/tag/v0.20220911.0

Not working :(

Can you try with pkg ins llvm14? lib/llvm of ponyc is fixed at 14.0.3 and when I built the runtime that is working on my box, I built the full LLVM suite from the ponyc repo, which took hours.

-- Reply to this email directly or view it on GitHub: https://github.com/savi-lang/savi/issues/367#issuecomment-1243047421 You are receiving this because you were mentioned.

Message ID: @.***>

-- Michael Neumann NTECS Consulting www.ntecs.de

jemc commented 1 year ago

I don't think it should make a difference - I don't see how the version of llvm-link could be coming into play here.

And we've already established that we're already using the same version of clang that you're using.

But I can try it anyway...

jemc commented 1 year ago

Here's the release tag: https://github.com/savi-lang/runtime-bitcode/releases/tag/v0.20220911.1

It should upload a new build for you to try soon.

jemc commented 1 year ago

If that doesn't work, then here's what I suggest for you to try - mimicking the Cirrus CI build process as closely as possible:

mneumann commented 1 year ago

@jemc still hangs with the above version.

> ./bin/example --ponyversion
 [release]
Compiled with: LLVM  -- Clang-13.0.0-

vs my:

> ./bin/example --ponyversion
0.51.2-bdabb47b [release]
Compiled with: LLVM 14.0.3 -- Clang-13.0.0-

Interesting how no LLVM version is shown in your runtime, while mine shows 14.0.3. Probably this is just a LLVM_VERSION env that is missing and has no effect.

I am trying your above suggestion to reproduce cirrus CI on my system.

mneumann commented 1 year ago

btw, just noticed that my build has the pony_assert in messageq.c disabled for which we once had a patch. but that shouldn't be the problem

mneumann commented 1 year ago

Tried with 0.51.2, both with llvm13 and llvm14. Both does not work.

Get this warning when linking:

warning: Linking two modules of different data layouts: '~/Tmp/ponyc/src/libponyrt/lang/except_try_catch.ll' is '' whereas 'llvm-link' is 'e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128'

ponyc HEAD does not work either.

mneumann commented 1 year ago

I tried with various compiler version, with no luck. Give me a few hours to try to build ponyc/lib/llvm again and use the ponyc build system.

I doubt it's related to LLVM itself, but maybe some flags are missing?

jemc commented 1 year ago

@mneumann - Thanks for testing that.

So now we know that:

Now can you try building Pony HEAD using the Savi build steps?

That will tell us whether there is a problem with which Pony commit is being used, or with which build steps are being used.

mneumann commented 1 year ago

@jemc tracked it down to the two flags: -march=native -mtune=generic. They are missing in your build. If I add them, it works.

jemc commented 1 year ago

I have tagged a new release with the added CFLAGS: https://github.com/savi-lang/runtime-bitcode/releases/tag/v0.20220912.0

@mneumann - let me know if that one works

mneumann commented 1 year ago

@jemc That works! Thanks! Closing this issue!

jemc commented 1 year ago

Great. Then I will update the Savi build Makefile to pull from this runtime release build.

Thanks for your help in tracking down the underlying issue.

mneumann commented 1 year ago

On Mon, Sep 12, 2022 at 07:33:33AM -0700, Joe Eli McIlvain wrote:

@mneumann - Thanks for testing that.

  • It works with Pony HEAD using Pony build steps
  • It fails with Pony 0.51.2 using Savi build steps

Now can you try building Pony HEAD using the Savi build steps?

That will tell us whether there is a problem with which Pony commit is being used, or with which build steps are being used.

My understanding is:

-- Reply to this email directly or view it on GitHub: https://github.com/savi-lang/savi/issues/367#issuecomment-1243834645 You are receiving this because you were mentioned.

Message ID: @.***>

-- Michael Neumann NTECS Consulting www.ntecs.de

jemc commented 1 year ago

Wait, so is it still an issue for you?

Do we need to reopen this ticket?