odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
6.13k stars 551 forks source link

[core/thread] "unexpected" Odin thread semantics that differ from pthread's semantics #3622

Open PucklaJ opened 1 month ago

PucklaJ commented 1 month ago

Context

I am using a Pinebook Pro.

Useful information to add to a bug report:

Odin:    dev-2024-05:856537f0c
OS:      Arch Linux ARM, Linux 6.7.9-1-MANJARO-ARM
CPU:     ARM64
RAM:     3862 MiB
Backend: LLVM 14.0.6

Expected Behavior

package main

import "core:fmt"
import "core:thread"

main :: proc() {
    fmt.println("Hello from main")
    t := thread.create_and_start(proc() {
        fmt.println("Hello from thread")
    })
    fmt.println("Waiting for thread")
    thread.join(t)
    fmt.println("Bye from main")
}

This program should output

Hello from main
Waiting for thread
Hello from thread
Bye from main

Current Behavior

But it currently outputs

Hello from main
Waiting for thread
Bye from main

Without the thread procedure ever getting called.

Steps to Reproduce

Build the program above and execute it on an arm64 machine.

Kelimion commented 1 month ago

Not a bug. It has the same behaviour in Windows and Linux on amd64. You're not waiting long enough for the thread to actually start before you call join.

package main

import "core:fmt"
import "core:sync"
import "core:thread"
import "core:time"

sema: sync.Atomic_Sema

main :: proc() {
    fmt.println("Hello from main")

    t := thread.create_and_start(proc() {
        sync.atomic_sema_post(&sema)
        fmt.println("Hello from thread")
    })
    fmt.println("Waiting for thread")
    sync.atomic_sema_wait_with_timeout(&sema, time.Millisecond * 50)

    thread.join(t)
    fmt.println("Bye from main")
}
PucklaJ commented 1 month ago

@Kelimion This is completely wrong! This is not how join works. There is no such thing as You're not waiting long enough in multithreading as long as you are using the right methods. To compare it with pthread I wrote this example:

#include <stdio.h>
#include <pthread.h>

void * thread_func(void* ptr) {
  puts("Hello from thread");
}

int main() {
  puts("Hello from main");

  pthread_t t;
  pthread_create(&t, NULL, thread_func, NULL);

  puts("Waiting for thread");
  pthread_join(t, NULL);

  puts("Bye from main");
  return 0;
}

This correctly outputs:

Hello from main
Waiting for thread
Hello from thread
Bye from main
Kelimion commented 1 month ago

Can you maybe tone it down a notch?

Your issue is that "thread does not start on linux arm64". That's very specific. Not "the thread doesn't start", with linux arm64 being the platform it happens on. We could already tell it was Linux ARM64 because that's in the hardware report, but you put it in the title, so it's emphasized. And thus I demonstrate it works the same on Windows and Linux AMD64, and that emphasis is unwarranted. It works (or doesn't) the same way across platforms.

Now you're saying "This is completely wrong! This is not how join works. There is no such thing as You're not waiting long enough in multithreading as long as you are using the right methods."

Now you're welcome to argue these semantics differ from pthread, and that it would behoove us to harmonize them, and that's a valid argument to make and one we'll take under consideration. It might even be classed as a bug.

But that doesn't make it "completely wrong". Wording matters.

PucklaJ commented 1 month ago

I actually tested your example on my Pinebook Pro and this shows the correct output. So you are right, you need to wait long enough. But still, if that's how your thread.join works then that's very misleading because other implementations (pthread and C++ std::thread::join) wait until the thread starts. Oddly enough on linux and windows amd64 it works completely fine without a semaphore. That's why I thought it was a bug specific to linux arm64 initially.

Kelimion commented 1 month ago

It might actually be a bug, but at a minimum it diverges from pthread in its semantics, so it's worth looking into either way, and we will.

Right now the semantics for our join are more like "tell the thread to exit if already running, or if just starting up, exit right away before doing any work." That could conceivably be made into a feature, with a default that works like pthread, and an exit_early_on_join as a flag you can give to the thread so that if it got joined before the OS started + scheduled it, it'll return before running the user proc.

There's something to say for cleaning up a thread without waiting on it to finish its user proc if it hasn't even started, and certainly also for running the user proc first. Defaulting to the latter (pthread) is sensible, of course.

Kelimion commented 1 month ago

Glancing at the code it's not strictly a bug, just due to when we check for join in the cross-platform wrapper, exiting early (because you asked for it to exit). Will restructure that to be more familiar but being able to opt in to the current behavior.

PucklaJ commented 1 month ago

This was a misunderstanding initially so I apologize for my outrage. I looked into it a bit and it seems that other languages implement join like it works in pthread. Those languages include python, rust, C++ and Vala. This is why I assumed that it would work the same in odin. I agree that it should work like pthread by default and maybe have the ability to opt out.

Kelimion commented 1 month ago

Noted. :-)

It kind of renders the .Joined flag as set by join useless (a duplicate of .Done), because with pthread semantics it'd be checked after the procedure is already done. So I'm leaning toward renaming it .Try_Join_Early to opt in to the current semantics, which the coder can then signal with a join(&t, true). By default it wouldn't be set, and so turn into pthread semantics.

Or, alternatively: join :: proc(t: ^Thread, pthread_semantics := true). Same difference in the end.