This is kleinerm's git repository for development of Psychtoolbox-3. Regular end users should stay away from it, unless instructed by him otherwise, and use the official Psychtoolbox-3 GitHub page or distribution system for production releases.
104
stars
304
forks
source link
Pull in experimental Vulkan support for macOS 10.15.4+ #203
This pull adds experimental basic Vulkan display backend support for macOS.
It requires at least macOS 10.15.4 Catalina to function, but is only tested on macOS 10.15.7 Catalina final.
It requires Khronos MoltenVK 1.1.1 to be installed in the system globally, ie. /usr/local/lib and /usr/local/include.
MoltenVK is an open-source Vulkan portability subset implementation of Vulkan 1.1 for macOS, built on top of
the Metal graphics rendering api.
In principle this works as on Linux and Windows, ie. OpenGL-Vulkan interop is used to render content in OpenGL,
then transfer it to Vulkan for display. Details of interop differ due to deficiencies of Apples OpenGL implementation
and current limitations of Vulkan for macOS:
We do not export VKImage backing VkDeviceMemory and import it into OpenGL textures as GL_memory_objects,
because Apple's prehistoric OpenGL implementation does not support the requires extensions. Instead we use
MoltenVK specific extensions to create backing IOSurface's for the VKImage interop image. Then we get a handle to
the IOSurface and use Apple macOS OpenGL specific extensions to import that IOSurface as a rectangle texture.
This has some limitations: No MSAA textures possible for interop, so MSAA implies our own MSAA resolve on the
OpenGL side.
Apples OpenGL does not support semaphores, so we only use a glFlush() for producer-consumer sync.
The only way to get blocking on present completion and timestamps of stimulus onset is via the GOOGLE
display timing extension, which is implemented by MoltenVK on top of Metal. Our standard double-buffer trick
or vkAquireImage() method does not work at all.
Some serious limitations of this implementation:
MoltenVK as of 1.1.2 has a buggy implementation of display timing, which does not dequeue presentation
timing records ever. We need to work around that spec violation until i manage to upstream proper fixes to
MoltenVK, or somebody else does.
At least on macOS 10.15.7, Direct-to-display (DtD) mode is seriously buggy and causes random malfunctions, so
we disable it. This is not a big loss, as DtD also does not remove the 1 frame extra compositor latency, contrary to
what Apple PR promises. Even in DtD, the DisplayServer/Compositor is still involved in present scheduling and due
to deficient scheduling of the DisplayServer - one service pass starting about 2-3 msecs into each video scanout cycle -
defers present of each present-ready frame by one extra video refresh cycle. This means a Screen('Flip') / Vulkan
present can only happen at most every 2nd video refresh cycle, limiting framerate to half the video refresh rate if
actual wait for stimulus onset + timestamping is needed and thereby PTB throttled on present completion. Iow. in
99% of all useful visual stimulation paradigms, performance will be severely limited.
This tested on a MacBookPro 2017 with AMD Radeon Pro 560 under macOS 10.15.7. This problem does not just
affect PTB / MoltenVK, but was also verified by using original Apple "best practices" Metal sample code. Iow. this
is a macOS Metal bug, unfixable by us. Using Apples "Instruments" Metal trace tools, i could confirm this behaviour
at the lowest level.
Timing and timestamping is unreliable! Sometimes, sporadically, timestamps are reported as 0, for no obvious
reason. This happens a lot during the first few presents, or after a long pause between successive presents, ie.
more than a few hundred msecs. Again, by modifying Apple "best practices" Metal sample code, and also by
tracing with "Instruments", this has been tracked down to be not a PTB or MoltenVK/Vulkan bug, but bugs in
macOS Metal implementation, iow. not our fault, not fixable by us, only by Apple.
Things that were verified to not matter:
If rendering happens on the application main thread (GUI/Event thread) or a secondary thread. Apples sample
allowed to test and refuse this hypothesis. Mods to our own code also showed no difference.
Peculiarities of event handling. Although adding printf statements in PsychVulkanCore/PsychVulkan/Matlab script/
MoltenVK seems to help a bit in PTB's case, but not in the standalone Apple sample Metal app.
DtD vs. standard fullscreen mode. Although DtD seems to work better for Apples Metal sample code, whereas
in windowed mode we observe a lot more timestamping failures.
Time intervals between presents seem to somehow matter somewhat sometimes. Results are inconsistent
across runs and machine reboots though. Sometimes present intervals < 33 msecs or > 250 msecs cause failure.
Other times the limits are 17 msecs and 500 msecs, sometimes we can go to multiple seconds before trouble
happens.
All i know is that Metal is severely buggy, and i could not find any meaningful workarounds for the problems in
multiple dozen hours of experimentation. This on macOS 10.15.7, MBP 2017 with AMD Radeon Pro 560 gpu.
That said, in the cases where timestamps are reported as non-zero, most of the time they seem to roughly
agree with reality, as measured with our FlipTimingWithRTPhotoDiodeTest.m script and Videoswitcher+RtBox
hardware reference timestamping. Timestamps are not as precise and high quality as those from our PTB
kernel driver, but they are ok enough for many applications. Seems they are hardware vblank interrupt handler
invocation timestamps without special high precision corrections -- Amateurish, but ok for basic use.
The implemented support also provides choice of 10 bpc and fp16 framebuffers and ColorCal-II measurements
suggest the 10 bpc mode achieves 10 bpc precision and the fp16 mode achieves 11-12 bpc linear precision,
however only by using an Apple proprietary dithering algorithm, not by using the gpu's fp16 scanout mode.
The implementation also supports very basic HDR display support. More limited than Windows-10, much
more limited than Linux. But at least the basics are covered: A scRGB linear colorspace, same methods as
used on MS-Windows in windowed mode. macOS does not allow query of the connected displays HDR
properties. This requires at least macOS 10.15.7. Standard HDR-10 / BT2020 / PQ is not supported, so
our encoding method is different from the high quality operating systems. Precision is worse than on Linux
or Windows, as macOS applies its own proprietary tone-mapping over which we do not have significant
control.
So in terms of the intended main purpose - getting trustworthy visual stimulation timing and timestamping
without the need for our typical hacks - this current implementation is mostly a failure. But given the failure
is due to Apple macOS Metal bugs, there is the faint hope that it may work better on other Apple hardware +
macOS combos, or future macOS versions if Apple get their act together.
Would i trust this implementation on macOS for proper stimulation? Hell no! But maybe it is good enough
for people with no or almost no need for reliable timing, or for people who need rather limited HDR support
and don't want to switch to real operating systems for that purpose.
This pull adds experimental basic Vulkan display backend support for macOS.
It requires at least macOS 10.15.4 Catalina to function, but is only tested on macOS 10.15.7 Catalina final. It requires Khronos MoltenVK 1.1.1 to be installed in the system globally, ie. /usr/local/lib and /usr/local/include.
MoltenVK is an open-source Vulkan portability subset implementation of Vulkan 1.1 for macOS, built on top of the Metal graphics rendering api.
In principle this works as on Linux and Windows, ie. OpenGL-Vulkan interop is used to render content in OpenGL, then transfer it to Vulkan for display. Details of interop differ due to deficiencies of Apples OpenGL implementation and current limitations of Vulkan for macOS:
We do not export VKImage backing VkDeviceMemory and import it into OpenGL textures as GL_memory_objects, because Apple's prehistoric OpenGL implementation does not support the requires extensions. Instead we use MoltenVK specific extensions to create backing IOSurface's for the VKImage interop image. Then we get a handle to the IOSurface and use Apple macOS OpenGL specific extensions to import that IOSurface as a rectangle texture. This has some limitations: No MSAA textures possible for interop, so MSAA implies our own MSAA resolve on the OpenGL side.
Apples OpenGL does not support semaphores, so we only use a glFlush() for producer-consumer sync.
The only way to get blocking on present completion and timestamps of stimulus onset is via the GOOGLE display timing extension, which is implemented by MoltenVK on top of Metal. Our standard double-buffer trick or vkAquireImage() method does not work at all.
Some serious limitations of this implementation:
MoltenVK as of 1.1.2 has a buggy implementation of display timing, which does not dequeue presentation timing records ever. We need to work around that spec violation until i manage to upstream proper fixes to MoltenVK, or somebody else does.
At least on macOS 10.15.7, Direct-to-display (DtD) mode is seriously buggy and causes random malfunctions, so we disable it. This is not a big loss, as DtD also does not remove the 1 frame extra compositor latency, contrary to what Apple PR promises. Even in DtD, the DisplayServer/Compositor is still involved in present scheduling and due to deficient scheduling of the DisplayServer - one service pass starting about 2-3 msecs into each video scanout cycle - defers present of each present-ready frame by one extra video refresh cycle. This means a Screen('Flip') / Vulkan present can only happen at most every 2nd video refresh cycle, limiting framerate to half the video refresh rate if actual wait for stimulus onset + timestamping is needed and thereby PTB throttled on present completion. Iow. in 99% of all useful visual stimulation paradigms, performance will be severely limited.
This tested on a MacBookPro 2017 with AMD Radeon Pro 560 under macOS 10.15.7. This problem does not just affect PTB / MoltenVK, but was also verified by using original Apple "best practices" Metal sample code. Iow. this is a macOS Metal bug, unfixable by us. Using Apples "Instruments" Metal trace tools, i could confirm this behaviour at the lowest level.
All i know is that Metal is severely buggy, and i could not find any meaningful workarounds for the problems in multiple dozen hours of experimentation. This on macOS 10.15.7, MBP 2017 with AMD Radeon Pro 560 gpu.
That said, in the cases where timestamps are reported as non-zero, most of the time they seem to roughly agree with reality, as measured with our FlipTimingWithRTPhotoDiodeTest.m script and Videoswitcher+RtBox hardware reference timestamping. Timestamps are not as precise and high quality as those from our PTB kernel driver, but they are ok enough for many applications. Seems they are hardware vblank interrupt handler invocation timestamps without special high precision corrections -- Amateurish, but ok for basic use.
The implemented support also provides choice of 10 bpc and fp16 framebuffers and ColorCal-II measurements suggest the 10 bpc mode achieves 10 bpc precision and the fp16 mode achieves 11-12 bpc linear precision, however only by using an Apple proprietary dithering algorithm, not by using the gpu's fp16 scanout mode.
The implementation also supports very basic HDR display support. More limited than Windows-10, much more limited than Linux. But at least the basics are covered: A scRGB linear colorspace, same methods as used on MS-Windows in windowed mode. macOS does not allow query of the connected displays HDR properties. This requires at least macOS 10.15.7. Standard HDR-10 / BT2020 / PQ is not supported, so our encoding method is different from the high quality operating systems. Precision is worse than on Linux or Windows, as macOS applies its own proprietary tone-mapping over which we do not have significant control.
So in terms of the intended main purpose - getting trustworthy visual stimulation timing and timestamping without the need for our typical hacks - this current implementation is mostly a failure. But given the failure is due to Apple macOS Metal bugs, there is the faint hope that it may work better on other Apple hardware + macOS combos, or future macOS versions if Apple get their act together.
Would i trust this implementation on macOS for proper stimulation? Hell no! But maybe it is good enough for people with no or almost no need for reliable timing, or for people who need rather limited HDR support and don't want to switch to real operating systems for that purpose.
Work time spent so far: More than 150 hours.