odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
7.03k stars 626 forks source link

Threads module does not work on android #4452

Open 0dminnimda opened 3 weeks ago

0dminnimda commented 3 weeks ago

Context

Building odin demo on termux gives this linker error

$ odin build examples/demo
ld.lld: error: undefined symbol: pthread_cancel
>>> referenced by odin_package
>>>               /data/data/com.termux/files/home/repos/Odin/demo.o:(thread._terminate-1838)

ld.lld: error: undefined symbol: pthread_setcancelstate
>>> referenced by odin_package
>>>               /data/data/com.termux/files/home/repos/Odin/demo.o:(thread._create-1792.__unix_thread_entry_proc-0)
>>> referenced by odin_package
>>>               /data/data/com.termux/files/home/repos/Odin/demo.o:(thread._create-1792.__unix_thread_entry_proc-0)

ld.lld: error: undefined symbol: pthread_setcanceltype
>>> referenced by odin_package
>>>               /data/data/com.termux/files/home/repos/Odin/demo.o:(thread._create-1792.__unix_thread_entry_proc-0)
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Bionic (android libc) explicitly decided not to implement pthread_cancel function and its relatives.

One way to solve this problem is to use pthread_kill instead, I can locally replace the code in the threading library, but trying to add this to official odin can bring up some challanges:

  1. Either replace pthread_cancel & family with pthread_kill
  2. Or switch behaviour based on whether we're compiling for android or not

1 is simpler and given it doesn't break any other platform and the change of the behaviour (if any) does not bother anyone, it should be the preffered solution. 2 requires adding a way to detect android. Making it a separate ODIN_OS is a breeking ground of problems. Because android is mostly compatible with linux every time you specify linux you also have to specify android, people will likely not do it, thus reducing support for android without any reason for this. I am not familiar with codebase to know if it's possible to check for specific linux distribution. Another reason to look into 2 is that for android reloc-mode:pic should be the default, and being able to distinguish it there will also be nice.

Env

Odin:    dev-2024-11:d2b2e790
OS:      Unknown Linux Distro, Linux 5.10.101-android12-9-00001-g1b6eff567cae-ab8943976
CPU:     ARM64
RAM:     5836 MiB
Backend: LLVM 19.1.2
laytan commented 3 weeks ago

A -subtarget:android may be the way to go, just like we are trying to do for iOS. That being said android is explicitly not officially supported at the moment and will need more changes like bionic libc having different struct fields&layout and flag values, so core:sys/posix needs an implementation specifically for Android which could make use of the -subtarget if that's added too. There may also be changes to core:sys/linux, if syscalls are even a valid thing for Android. Then I also imagine some integration with the android ndk is in order to smooth out the experience.

0dminnimda commented 3 weeks ago

Well developing for android usually comes in two forms:

  1. Android Applications (.apk)
  2. Linux Applications using things like termux

They both use the same underlying system but in different ways, so getting one will help with the other a lot. That said termux is an enviroment much closer to linux, in that regard android is really close to any other linux, it just occasionally needs a few tweaks, so it's like 99% there. With all those issues and PRs I'm trying to make odin work locally on termux. I have not made anything significant on the .apk side, but in my unsderstanding the process to get something working there is much more convoluted, kind of the same as developing something for a console.

0dminnimda commented 3 weeks ago

Of course I don't know all of the bionic, to say that nothing in sys/posix of sys/linux will have to change, but it's a resonable assumption that most of the code will stay the same. At times android finds a way to mess things, so the rare unwanted surprices will happen, exactly like with pthread_cancel

laytan commented 3 weeks ago

Of course I don't know all of the bionic, to say that nothing in sys/posix of sys/linux will have to change, but it's a resonable assumption that most of the code will stay the same.

That's not a good assumption though, between implementation of POSIX and libc there is room for adding non-standard fields and moving fields around in structs, and constant/flag values are also not explicitly defined so every implementation basically does it's own thing. sys/linux is raw syscalls that should "just work", IF raw syscalls are even allowed on android. If they are not then packages like os, net, etc. that are now using the raw syscalls would need a specific implementation calling through bionic too. (But thinking about it, android has its whole permission model and stuff so os probably doesn't work anyway (at least not without changes)).

But coming back to the issue, pthread_kill and cancel are pretty different things and can't just be replaced. Although, I just looked at the package and the combination of the proc being called terminate, the description mentioning "force" and the windows implementation doing pretty much the equivalent of a pthread_kill it actually starts to feel more appropriate :p

I want to clarify that I know little about what is allowed or not, and also know nothing about termux and I'm just trying to provide helpful input here.

flysand7 commented 3 weeks ago

sys/linux is raw syscalls that should "just work", IF raw syscalls are even allowed on android

image

This can break libc compatibility, and will in general be a lot of pain.

laytan commented 3 weeks ago

Don't implement core:threads with raw syscalls, I'm begging you!

Good thing I didn't say that at all then ;)

0dminnimda commented 3 weeks ago

But coming back to the issue, pthread_kill and cancel are pretty different things and can't just be replaced. Although, I just looked at the package and the combination of the proc being called terminate, the description mentioning "force" and the windows implementation doing pretty much the equivalent of a pthread_kill it actually starts to feel more appropriate :p

One way to replace pthread_cancel with pthread_kill is to send a signal to a thread and handle it to close the thread gracefully. I tried to make it work, but the signal handler is not called for some reson. Will leave it here, just in case, and will switch to a forceful termination.

globally:

_SIG_CANCEL :: posix.Signal(posix.SIGUSR1)

in __unix_thread_entry_proc:

        actions : posix.sigaction_t = {}
        posix.sigemptyset(&actions.sa_mask)
        actions.sa_flags = {}
        actions.sa_handler = proc "c" (signum: posix.Signal) {
            posix.puts("_thread_signal_handler\n")
            posix.pthread_exit(nil);
        }
        sigaction_result := posix.sigaction(_SIG_CANCEL, &actions, nil)
        posix.puts("post sigaction setup\n")

        sigset: posix.sigset_t = {}
        posix.sigemptyset(&sigset)
        posix.sigaddset(&sigset, _SIG_CANCEL)
        if posix.pthread_sigmask(auto_cast posix.SIG_UNBLOCK, &sigset, nil) != posix.Errno(0) {
            posix.perror("pthread_sigmask")
            return nil
        }

And the _terminate:

_terminate :: proc(t: ^Thread, exit_code: int) {
    fmt.print("Triggering SIGUSR1 for ", t.unix_thread)
    status := posix.pthread_kill(t.unix_thread, _SIG_CANCEL)
    if status != posix.Errno(0) {
        fmt.print("Error cancelling thread ", t.unix_thread, ", error = ", status)
        // " (", c.strerror(status), ")")
    }
    fmt.print("Post SIGUSR1")
}
flysand7 commented 3 weeks ago

Don't implement core:threads with raw syscalls, I'm begging you!

Good thing I didn't say that at all then ;)

I might have misread what you meant by raw-dogging syscalls on android then. Since this issue is about core:thread I inferred that's what you might have talked about, but I'll admit I did not read the whole discussion, I just wanted to make a meme :3

0dminnimda commented 3 weeks ago

That's not a good assumption though, between implementation of POSIX and libc there is room for adding non-standard fields and moving fields around in struct

Well, that's definitely true for the androids struct sigaction

/* The kernel's struct sigaction doesn't match the POSIX one. */

As they put sa_flags as a first field.

0dminnimda commented 3 weeks ago

A -subtarget:android may be the way to go, just like we are trying to do for iOS.

It seems like if you want to specify subtarget you need to specify the target. It's quite inconvenient. Android has different target triple, can we distinguish the subtarget from it?

laytan commented 3 weeks ago

A -subtarget:android may be the way to go, just like we are trying to do for iOS.

It seems like if you want to specify subtarget you need to specify the target. It's quite inconvenient. Android has different target triple, can we distinguish the subtarget from it?

I am not entirely sure what you mean here, it would end up looking like -target:linux_arm64 -subtarget:android is what I imagined.

There's this part of the code that changes the llvm target triple based on -subtarget for iOS: https://github.com/odin-lang/Odin/blob/e6475fec4d2a3e34099b24a7a3bf890c7a3ef8d9/src/build_settings.cpp#L1613

0dminnimda commented 3 weeks ago

I mean that target triples (at least supported by termux) are:

Yes, it's possible to specify the target fully for each compilation, but it would be much nicer if the subtarget would be inferred by the compiler if it's running on android. I'm trying to find a way to do it.

0dminnimda commented 3 weeks ago

We could take Windows_Subsystems idea and make something likeLinux_Distribution/OS_Distribution which could include android. Maybe it's just a better name for subtarget

0dminnimda commented 3 weeks ago

Ok, in init_build_context the macros help choose the default target triple.

0dminnimda commented 3 weeks ago

moved to https://github.com/odin-lang/Odin/issues/4457

By the way, do you require to specify -subtarget:ios to compile for it?

You can detect it and default to it if not specified.

#if defined(__APPLE__) && defined(__MACH__)
    // Apple OSX and iOS (Darwin)
    #include <TargetConditionals.h>
    #if TARGET_IPHONE_SIMULATOR == 1
        #define PLATFORM_NAME "iOS" // Apple iOS
    #elif TARGET_OS_IPHONE == 1
        #define PLATFORM_NAME "iOS" // Apple iOS
    #elif TARGET_OS_MAC == 1
        #define PLATFORM_NAME "OSX" // Apple OSX
    #else
        #define PLATFORM_NAME "Apple Platform"
    #endif
    #define PLATFORM PLATFORM_APPLE
#elif defined(__APPLE__)
    #define PLATFORM_NAME "MacOS (Classic)"
    #define PLATFORM PLATFORM_APPLE
0dminnimda commented 3 weeks ago

After fixing linker errors for 3 functoins I can run the tesets:

$ odin test tests/core/normal.odin -file -all-packages
...
Finished 386 tests in 7.615653847s. 10 tests failed.
 - test_core_net.test_dns_resolve                                                       expected resolve_err to be nil, got Unable_To_Resolve
 - test_core_time.test_check_timezone_edgecases                                         Failed to convert to Tokyo time                                   - test_core_time.test_check_timezone_metadata                                          Expected daylight savings == false, got true                      - test_core_time.test_convert_timezone_roundtrip                                       Failed to load America/Edmonton timezone                          - tests_core_posix.execute_struct_checks                                               expected posix.WEXITSTATUS(status) to be 0, got 1
 - tests_core_posix.open_permissions                                                    expected stat.st_mode to be bit_set[Mode_Bits; _mode_t]{IROTH, IRGRP, IWUSR, IRUSR}, got bit_set[Mode_Bits; _mode_t]{IWUSR, IRUSR}                                                                                         - tests_core_posix.test_glob                                                           expected res to be NOMATCH, got %!(BAD ENUM VALUE=-3)
 - tests_core_posix.test_langinfo                                                       expected day1 to be Sunday, got
 - tests_core_posix.test_locale                                                         "C.UTF-8" is not POSIX or C
 - tests_core_posix.test_stat                                                           expected stat.st_mode to be bit_set[Mode_Bits; _mode_t]{IROTH, IRGRP, IWUSR, IRUSR, IFREG}, got bit_set[Mode_Bits; _mode_t]{IWUSR, IRUSR, IFREG}

Not too bad!

0dminnimda commented 3 weeks ago

And yeah, https://github.com/odin-lang/Odin/issues/4452#issuecomment-2455728634 actually works, it's even more organized here. The fix was to correct some of the definitions on the pthreads types.