jonhoo / drwmutex

Distributed RWMutex in Go
MIT License
343 stars 17 forks source link

Question about CPUs map #8

Closed tylertreat closed 7 years ago

tylertreat commented 7 years ago

I'm wondering why the cpus map which maps CPUID to a RWMutex index is strictly needed? I.e. wouldn't you get the same effect using mx[cpu()].RLocker() where mx is sized according to runtime.NumCPU()? This would remove the need for platform-specific APICID mappings, but I must be missing something.

jonhoo commented 7 years ago

The issue is that CPUID returns an APIC ID, which is not just a sequential number starting at 0. Instead, the APIC ID can easily be in the hundreds, even if you have way fewer CPUs than that.

tylertreat commented 7 years ago

You could just mod it though, right? mx[cpu() % NUM_CPU].RLocker() I haven't tested this to see what the performance is however. For context, I've been looking at how to implement this for macOS. Unfortunately, the options described in https://github.com/jonhoo/drwmutex/issues/5 don't work. All I could manage was a brute-force hack since Darwin doesn't have SYS_SCHED_GETAFFINITY/SYS_SCHED_SETAFFINITY.

jonhoo commented 7 years ago

It's true that we could mod, though then you (always) end up with contention for certain pairs of unrelated CPUs (mod won't evenly distribute). As for #5, I honestly don't know what the right thing to do is for other platforms (hence the filed and long-open issue). Your workaround will unfortunately likely only work for machines with very few cores, since it's only there that an identity mapping works out. The best pointer I can find for macOS is this reference to GetCurrentProcessorNumber. Maybe that'll help?

tylertreat commented 7 years ago

Yeah, though unfortunately I don't think the problem is determining the processor a thread is running on. CPUID seems to work fine for that. The problem is building the APIC ID map.

jonhoo commented 7 years ago

Ah, no, I was referring to the trick mentioned in #5. Cycle through all the CPUs, use the function for pinning the thread to a single CPU, wait a little while, run CPUID, then move on to the next one. That way you can enumerate the entire mapping.

tylertreat commented 7 years ago

use the function for pinning the thread to a single CPU

Which function is that? Note that Darwin does not support the setaffinity/getaffinity syscalls in Go AFAICT.

Similarly, I'm not sure how to cycle through the CPUs reliably.

jonhoo commented 7 years ago

You're right, I misinterpreted. It seems like there aren't even good workarounds for doing this on macOS dynamically at runtime :(

jonhoo commented 7 years ago

I think the real long-term and interoperable solution here is to enumerate the CPUs using CPUID itself. You can invoke it with various arguments to have it give topology information instead of just reporting the current CPU, and that should let you construct the map. It is a bunch of work though, which is why I haven't done it :)

tylertreat commented 7 years ago

Thanks, I may take a look at doing that at some point.

jonhoo commented 7 years ago

Sorry I couldn't be of more help! If you do end up writing it up (the OSDev wiki has a great step-by-step), I'd be happy to review and accept a PR.