Improve GC stack scanning

Right now, the GC will scan only the in-use part of the stack of the current goroutine, but for every other goroutine it will scan the entire stack (not just the part that contains any data).

This can be fixed for the precise GC in the following way:

Use a special "layout" value when allocating a stack (that would otherwise not be used), and detecting this value during the scan.
Find the *task.Task struct that it belongs to (how? Maybe by putting a pointer somewhere at a fixed point back to *task.Task?)
Scanning from the sp address in the *task.Task struct up to the highest address. (This needs some adjusting for WebAssembly, where there are in fact two stacks that need to be scanned per paused goroutine).

If there are multiple goroutines in a running program, I think this will significantly improve GC performance.

tinygo-org / tinygo

Improve GC stack scanning #4550