vsg-dev / vsgExamples

Example programs that test and illustrate how to use the VSG and optional add-on libraries
MIT License
148 stars 67 forks source link

Android Example issues #190

Closed geefr closed 1 year ago

geefr commented 1 year ago

To track outstanding work for the android example

The window initialisation is a little odd - The native window handle is passed down to vsg okay, but hits a bad any_cast when read.

Code is definitely correct so I think it's an issue with mismatched C++ STLs, seen it a few times on android, but in this case main.cpp & vsg are linked statically, so should be sharing a single STL variant between them.

geefr commented 1 year ago

Okay it's not a mismatched STL, but it's a similar awkward linking / symbol duplication issue, here's the best summary I can give, not sure I understand all of it myself.

Notably:

The behaviour is that the example's main.cpp and Android_Window.cpp end up with different entries for the typeid of ANativeWindow. This causes the bad_cast exception when reading the window handle in vsg.

Interestingly despite the LLVM/libc++ bug reports saying they now use a string comparison, the typeid comparison I'm getting is based on pointers -> If for some reason there's 2 typeid entries, that cast will fail, and we can't initialise the window.

Adding an extra any_cast in main.cpp before passing into vsg avoids this, but perhaps only in some architectures / situations, I think as it changes which typeid entry appears first in the table.

It may be an idea to consider a vsg::any implementation, or encapsulating the native window handles in a vsg::WindowHandle/WindowHandleAndroidNative class hierarchy; Something that would remove the need for rtti to be working across the library boundary.

I'm afraid Android has always been a little strange in this area, so may be the simpler approach. In theory the latest version of the NDK should have fixed this however, or have some combination of flags we can set to get the desired behaviour.

I'm not sure why this didn't appear earlier, but the initial version of the example was using quite an old NDK / android toolchain version.

I'll have to come back to this, I'll work around it with some local hacks to vsg for my immediate needs.

robertosfield commented 1 year ago

Hi Garath,

Thanks for the testing on Android. I don't have any insights to add about getting working around Android's peculiarities. Hopefully others will be able to help out.

On the default model side for the Android example perhaps the way forward to use the vsgconv facility for writing the model to a .cpp i.e.

vsgconv model/lz.vsgt model.cpp

Then use this .cpp in the source doe, this is how the vsgiosnative example gets around the problem of loading models.

https://github.com/vsg-dev/vsgExamples/tree/master/examples/platform/vsgiosnative

The code for loading the model is:

https://github.com/vsg-dev/vsgExamples/blob/master/examples/platform/vsgiosnative/main.mm#L50

The lz() function is provided by the lz.cpp.

It would be good to resolve these Android issues fot vsgExamples-1.0.0 release.

Cheers, Robert.

geefr commented 1 year ago

Thanks, sounds like the simplest option for model loading.

Another very strange crash I'm seeing - A segfault within getQueue, down inside vulkan.

    uint32_t transferQueueFamily = window->getOrCreatePhysicalDevice()->getQueueFamily(VK_QUEUE_TRANSFER_BIT);
    auto q = window->getOrCreateDevice()->getQueue(transferQueueFamily);

image

Haven't tried many things yet but I suspect it's either the same issue as the window init or something android-version-specific (since we noted the presence of headers that don't match the vulkan runtime that's actually available sometimes)

geefr commented 1 year ago

Some further debugging on the ANativeWindow/std::any casting issues. Most relevant issue link seems to be https://github.com/Samsung/ONE/issues/4157

Made some minimal test examples, and while std::any and std::any_cast are functional in most cases, for the ANativeWindow* we need it's not. I believe this is because the type we see is only ever an opaque pointer to a forward-declared struct ANativeWindow, along with some accessor functions. The linking for the vsg app is never going to see the real definition, so the requirement for it to have a key function (to satisfy typeid(foo) == typeid(foo) across compilation units) cannot be met.

I think also this didn't used to be an issue specifically, or it worked by chance - Older versions of Android may not have used RTLD_LOCAL when loading the library, or used a different linker etc (There's always rapid change here, and Android native tools can be picky as a result).

@robertosfield Locally I've tried with a new Android_WindowTraits class to store the window handle, and I think ignoring questions of backwards compatibility that's the way to go. Instead of messing with std::any we init traits as you'd expect. In Android_Window we can keep a fallback to the std::any handle on WindowTraits, but it sounds like creating platform-specific classes is the way to go if Android is unable to support typeinfo/casting properly.

auto traits = vsgAndroid::Android_WindowTraits::create( awkwardNativeWindowPointer );
// other WindowTraits setup, as usual
auto window = vsgAndroid::Android_Window::create(traits);

2nd issue of segfaults when getting the transfer queue persists - I've had various vulkan samples running on my phone though, so I think that's a separate issue from any typeinfo / linker shenanigans.

robertosfield commented 1 year ago

Hi Gareth

Thanks for your efforts looking into Android issues. An Android subclass from WindowTraits might one way, but perhaps you pass the extra data user data assigned to the WidowTriats i.e.

windowTraits->setValue("myNumber", 10);

auto myObject = vsg::Object::create(); windowTraits->setObject("myObject", myObject);

Cheers, Robert.

Message ID: @.***>

geefr commented 1 year ago

Ah, I forgot vsg had that, and yes that can work with no API rework to deal with.

Thanks, no real guess on timeline but I think this can be resolved, vulkan itself definitely works on Android.

geefr commented 1 year ago

Okay I think I've worked it out, vsg proven working on a phone 20221124170332_IMG_7177

The 2nd crash was due to how my phone reports vulkan queues (Adreno 610) - It has a single queue for GRAPHICS | COMPUTE, but it doesn't specifically mention TRANSFER in the families. See http://vulkan.gpuinfo.org/displayreport.php?id=17515

So in Viewer::assignRecordAndSubmitTaskAndPresentation we get (uint32_t)-1, which is then passed to later vulkan functions.

uint32_t transferQueueFamily = device->getPhysicalDevice()->getQueueFamily(VK_QUEUE_TRANSFER_BIT);

Changing to this fixes the problem. My understanding is that any graphics queue must also support transfer, even if it's not reported, so maybe the fix should be in getQueueFamily instead?

Either way, the basic vsg display seems functional on android, I'll raise a vsg PR once I've tidied a few other things up.

auto transferQueueFamily = device->getPhysicalDevice()->getQueueFamily(VK_QUEUE_TRANSFER_BIT);
if( transferQueueFamily == -1 )
{
    transferQueueFamily = device->getPhysicalDevice()->getQueueFamily(VK_QUEUE_GRAPHICS_BIT);
}
robertosfield commented 1 year ago

Thanks great news.

Its end of day here, so tomorrow I'll look into generalizing the getQueueFamily() so it falls back to the VK_QUEUE_GRAPHICS_BIT if a VK_QUEUE_TRANSFER_BIT is available.

robertosfield commented 1 year ago

I have just checked the PhysicalDevice::getQueueFamily() fallback I mentioned above:

https://github.com/vsg-dev/VulkanSceneGraph/commit/00d6bf0f800315d15c304458310b6632337092b4

I'll now review the PR's you've just posted.

robertosfield commented 1 year ago

I have merged all the changes with vsgExamples master + VSG master, and then merged with the respective 1.0 branchesm and have tagged 1.0.1-rc1's respectively, So... I think it's safe to close this Issue :-)