ngoquang2708 / android_device_samsung_vivalto3gvn

Device tree for Samsung Galaxy V SM-G313HZ
7 stars 8 forks source link

The display framebuffer cannot be initialized #95

Open ngoquang2708 opened 6 years ago

ngoquang2708 commented 6 years ago

https://gist.github.com/ngoquang2708/184dc1f24ccaf66dbd0bc075e19bb6e2

diepquynh commented 6 years ago

Don't leave egl libs on vendor partition. They aren't VNDK supported

diepquynh commented 6 years ago

Oh wait, did you notice these?

01-01 00:00:13.874   152   152 E SurfaceFlinger: Couldn't set SCHED_FIFO for SFEventThread
01-01 00:00:13.874   152   152 E SurfaceFlinger: Couldn't set SCHED_FIFO for EventThread
ngoquang2708 commented 6 years ago

cm-14.1 has that too. In Oreo, surfaceflinger has successfully load hwc gralloc. But it failed to initialize egl context. I am stuck at that. My only clue is that difference binder device cause that. Not sure.Off topic. Are you stuck at brining ril to work? May be that because of new pthread in O. Like my issue on phoneserver, I patched bionic and make phoneserver not to crash. Hope that help. null

ngoquang2708 commented 6 years ago

Well, look at the near end of the log, it seem that those egl libs are loaded successfully. -------- Original message -------- From: Remilia Scarlet notifications@github.com Don't leave egl libs on vendor partition. They aren't VNDK supported

ngoquang2708 commented 6 years ago

Ah, I got it. libMali use dlopen to load egl libs in which it use absolute path to /system. I have patched libMali to load it from /vendor. Currently surfaceflinger is loaded egl successfully, but still crash. https://gist.github.com/ngoquang2708/5b51e014548ea6ab9f968cad87fa17aa

diepquynh commented 6 years ago

Well actually RIL is... yeah, a big issue for me RIL on sc8830 couldn't get signals (so can't make calls too) but can receive and answer calls, messages. I was trying to disassembly stock libril so see which RIL signals implementation is being used on it

Now I can't test anything tho, as my PC is dead and my lappy's RAM couldn't handle compiling O

diepquynh commented 6 years ago

I remember there's a flag for disabling fatal throw on bionic but seems LOS guys removed it, for security reasons

diepquynh commented 6 years ago

Did you manage to boot with HIDL graphics HAL?

ngoquang2708 commented 6 years ago

No HIDL. But https://github.com/TeamButter/android_frameworks_native/commit/1246cf4b3b2ef96499948af148e2914d7024a991.

ngoquang2708 commented 6 years ago

No HIDL. With it, mali failed to validate the memory allocated by gralloc. Maybe difference memory location since HIDL use a separated/dedicated process to load gralloc HAL.

diepquynh commented 6 years ago

We can use our custom HIDL gralloc implementation. Just a hint

ngoquang2708 commented 6 years ago

How is that possible?

diepquynh commented 6 years ago

HIDL gralloc works as a normal HAL, in vendor partition (as we don't have, it's our vendor folder in system)

diepquynh commented 6 years ago

As long as we have our HIDL services called via HIDL API, we can implement custom HIDL stuffs

diepquynh commented 6 years ago

An example for HIDL service is this https://github.com/LineageOS/android_hardware_interfaces/blob/lineage-15.1/graphics/allocator/2.0/default/service.cpp

ngoquang2708 commented 6 years ago

I have configured it as passthrough in manifest.xml. Build android.hardware.graphics.allocator@-service and android.hardware.graphics.allocator@-impl. But still cannot get surfaceflinger to start.

diepquynh commented 6 years ago

Actually we have those graphics HIDL HALs run as passthrough by default What confusing me is how I got out of memory error with gralloc. I'm thinking about implementing CMA from kernel side

ngoquang2708 commented 6 years ago

That would be a hard job!

diepquynh commented 6 years ago

I found this commit. Maybe this will help with our galloc https://github.com/LineageOS/android_frameworks_av/commit/c1bdcc7fc8b1c75f4d17f364c5ab2e8fcc0f375c

ngoquang2708 commented 6 years ago

@remilia15 What is the correct way to setup HIDL HALs for graphics? Currently I set it up like this:

android.hardware.graphics.allocator@2.0-impl
android.hardware.graphics.allocator@2.0-service
android.hardware.graphics.composer@2.1-impl
android.hardware.graphics.composer@2.1-service
android.hardware.graphics.mapper@2.0-impl
diepquynh commented 6 years ago

These are right, but we need passthrough for allocator and composer if you're planning to run HIDL services too Mapper doesn't need, as it's already a passthrough HAL

ngoquang2708 commented 6 years ago

What is passthrough and binderized HIDL HALs. I have read about them at source.android.com but still confuse.

diepquynh commented 6 years ago

All HIDL HALs are passthrough if they don't have a service package (e.g android.hardware.graphics.composer@2.1-service)

ngoquang2708 commented 6 years ago

OK. Thank you!

diepquynh commented 6 years ago

After tons of experiments and failures, I just realized (blame my noobness) that we need custom implementation of gralloc_alloc_framebuffer_locked and cannot rely on private_handle_t member, since we use shared_fd for ION, but the fb device doesn't need it as you said, and mapper treats it as framebuffer fd so as a result, native_handle_clone returns NULL, and HIDL gralloc couldn't initialize (there's a simple logic that libui sends allocate request to HIDL gralloc, and HIDL gralloc sends buffer request to mapper)

If we couldn't do this, the camera will die forever, and so do screenshot'ing

ngoquang2708 commented 6 years ago

OMG. I just realized how dump I am until now. Now I know what is binderized HALs vs passthrough HALs. Smh.

diepquynh commented 6 years ago

Btw can you boot with HWC? Mine just dead with NPE, and WiFi display needs HWC to work

ngoquang2708 commented 6 years ago

Not yet.

ngoquang2708 commented 6 years ago

@remilia15 Did you notice this. When running binderized composer, this log appear twice

07-23 22:24:32.707   178   382 I [Gralloc]: using (fd=11)
07-23 22:24:32.707   178   382 I [Gralloc]: id           = sprdfb
07-23 22:24:32.707   178   382 I [Gralloc]: xres         = 480 px
07-23 22:24:32.707   178   382 I [Gralloc]: yres         = 800 px
07-23 22:24:32.707   178   382 I [Gralloc]: xres_virtual = 480 px
07-23 22:24:32.707   178   382 I [Gralloc]: yres_virtual = 1600 px
07-23 22:24:32.707   178   382 I [Gralloc]: bpp          = 32
07-23 22:24:32.707   178   382 I [Gralloc]: r            =  0:8
07-23 22:24:32.707   178   382 I [Gralloc]: g            =  8:8
07-23 22:24:32.707   178   382 I [Gralloc]: b            = 16:8
07-23 22:24:32.707   178   382 I [Gralloc]: width        = 52 mm (234.461533 dpi)
07-23 22:24:32.707   178   382 I [Gralloc]: height       = 87 mm (233.563217 dpi)
07-23 22:24:32.707   178   382 I [Gralloc]: refresh rate = 60.00 Hz

but with difference pid (and framebuffer fd). They are sufaceflinger and android.hardware.graphics.composer@?-service. So surfaceflinger does not use android.hardware.graphics.composer@?-service???

ngoquang2708 commented 6 years ago

Digging surfaceflinger code :D

diepquynh commented 6 years ago

It must be in use, because HIDL path is preferred more than legacy

diepquynh commented 6 years ago

I feel so fucking weird now, because HIDL allocator runs with patched libui on your scx15 :/

ngoquang2708 commented 6 years ago

I removed patched libui. Now I got why there is two opening of the previous log. They are bootanimation and surfacelinger. all binderized hal is not crash, just surfaceflinger and bootanimation keep crashing on this

07-23 22:24:49.551   521   521 W GrallocMapperPassthrough: buffer descriptor with invalid usage bits 0x400
07-23 22:24:49.551   521   521 E GraphicBufferAllocator: Failed to allocate (480 x 800) layerCount 1 format 1 usage 1e02: 5
07-23 22:24:49.551   521   521 E BufferQueueProducer: [FramebufferSurface] dequeueBuffer: createGraphicBuffer failed
07-23 22:24:49.551   521   521 E [EGL-ERROR]: void __egl_platform_dequeue_buffer(egl_surface*):1854: failed to dequeue buffer from native window 0xa62ff808; err = -12, buf = 0x0,max_allowed_dequeued_buffers 1
07-23 22:24:49.551   521   521 E libEGL  : eglMakeCurrent:1062 error 3003 (EGL_BAD_ALLOC)
07-23 22:24:49.551   521   521 E SurfaceFlinger: DisplayDevice::makeCurrent on default display failed. Aborting.
diepquynh commented 6 years ago

You know that the goddamn fb in gralloc is fucking with shared_fd But how HIDL allocator and composer can run with patched libui o.O

ngoquang2708 commented 6 years ago

I think they are running but there is no allocation request to them so they keep alive :D

diepquynh commented 6 years ago

I just looked at the patch again, and it requires mapper to run allocator service o.O The allocator service is registered but doesn't run, so you were right

diepquynh commented 6 years ago

@ngoquang2708 I just noticed an interesting thing with HIDL mapper used with patched libui: It doesn't even touch framebuffer stuffs at all

diepquynh commented 6 years ago

Oh my bad. I wasn't using allocator with mapper that time so mapper won't touch framebuffer device

ngoquang2708 commented 6 years ago

I am thinking of a way to pass the -1 fd to mali. We need to find the way android pass the native_handle (private_handle) to mali. In gralloc, we can fake that fd say 0 so that native_handle_clone will success, then before passing that cloned handle to mali, we set it back to -1 so that we don't confuse mali.

ngoquang2708 commented 6 years ago

If I run with the following config:

    <hal format="hidl">
        <name>android.hardware.graphics.allocator</name>
        <transport>**hwbinder**</transport>
        <version>2.0</version>
        <interface>
            <name>IAllocator</name>
            <instance>default</instance>
        </interface>
    </hal>
    <hal format="hidl">
        <name>android.hardware.graphics.composer</name>
        <transport arch="32">passthrough</transport>
        <version>2.1</version>
        <interface>
            <name>IComposer</name>
            <instance>default</instance>
        </interface>
    </hal>
    <hal format="hidl">
        <name>android.hardware.graphics.mapper</name>
        <transport arch="32">passthrough</transport>
        <version>2.0</version>
        <interface>
            <name>IMapper</name>
            <instance>default</instance>
        </interface>
    </hal>

There is no error in native_handle_clone. I don't know if it is called or not. But the allocation is still failed with mysteries hidl callback transaction error.

diepquynh commented 5 years ago

wait why **hwbinder**? Are you highlighting or sth?

ngoquang2708 commented 5 years ago

Yeah, highlighting.

diepquynh commented 5 years ago

Have some logs line to debug it. I'm sure there will be errors, whatever you set passthrough or binderized

Stricted commented 5 years ago

hi guys

i have the same problem on my device (mediatek soc) after digging around abit i found that it fails here https://github.com/LineageOS/android_hardware_interfaces/blob/lineage-15.1/graphics/mapper/2.0/default/GrallocMapper.cpp#L137 as you also found out native_handle_clone fails i liked your idea in passing the -1 fd to mali i was actually successful with that, i just copied native_handle_clone and modified it (you can actually do that in GrallocMapper.cpp which would allow us to just build a custom interface)

native_handle_t* native_handle_clone_mt8127(const native_handle_t* handle) {
    native_handle_t* clone = native_handle_create(handle->numFds, handle->numInts);
    if (clone == NULL) return NULL;

    for (int i = 0; i < handle->numFds; i++) {
        clone->data[i] = handle->data[i];
    }

    memcpy(&clone->data[handle->numFds], &handle->data[handle->numFds],
           sizeof(int) * handle->numInts);

    return clone;
}

well that kinda worked but now the gralloc goes hell with E [MALI][Gralloc-ERROR]: int gralloc_register_buffer(const gralloc_module_t*, buffer_handle_t):110 Can't register buffer 0xab7a3320 as it is a framebuffer and it hits this check afterwards https://github.com/LineageOS/android_hardware_interfaces/blob/lineage-15.1/graphics/mapper/2.0/default/GrallocMapper.cpp#L144

thats where im stuck now

diepquynh commented 5 years ago

First of all, you can never clone FB's shared fd, because its fd is always negative

Second, even if you force it to clone, mali won't recognize the cloned data, because there's no FD to read and access, result in mali fails to allocate physical address for FB (tho this is done by actually read the FB fd instead of the shared fd through clone process, and mali can never do it with HIDL mapper)

Third, from the 2nd reason, it's also result in FB fd is being treated as shared_fd, result in the given error line in logcat as you mentioned

ngoquang2708 commented 5 years ago
08-14 20:19:50.950  2636  2674 D vndksupport: Loading /vendor/lib/hw/gralloc.scx15.so from current namespace instead of sphal namespace.

Will this cause any issue? If this is the case, then the initialized framebuffer memory should have to be re-map again?

diepquynh commented 5 years ago

Just ignore it. We don't support VNDK libraries

ngoquang2708 commented 5 years ago

@remilia15 What do you mean by support VNDK libraries? Is it using only VNDK libraries for hardware modules?

diepquynh commented 5 years ago

The actual fact is that we don't have VNDK blobs, rather than not supporting VNDK libraries VNDK libs are used for treble configuration. In short, everything in vendor can only rely on core Android libraries (libc, libc++, ....), and not anything else, or it would break VNDK compliance

ngoquang2708 commented 5 years ago

I have observed something link this, if I remove this line and modify the native_handle_clone to allow dupplication of negative file descriptor like what @Stricted did, then libMali can successfully map the physical memory of that native handle and no crashing of surfaceflinger but nothing happen in the display. What I did is based on what QCOM did with their gpu_context_t::free_impl function in their libgralloc0. Based on what I have observed, I found out that after successfully cloned the framebuffer native handle, the GraphicBuffer in libui then calling free on the original handle, cause libMali to crash since I think it is still hold the old handle because there is no native_handle_clone to mess with until N.

@Stricted You can modify your gralloc_register_buffer function to return 0 if the buffer_handle_t is a framebuffer handle like this to see that is the difference.