mripard / sunxi-mali

GNU General Public License v2.0
100 stars 54 forks source link

Mainline kernel 4.19.0 freeze with sunxi-mali #54

Closed avafinger closed 5 years ago

avafinger commented 5 years ago

Hi @mripard , Recently i upgraded to mainline kernel 4.19.0 with your sunxi-mali but mali freezes the kernel without any message when in use. This is working on mainline kernel 4.18.y and surprisingly without cma reserved-memory, i missed that on 4.17 and 4.18, anyways mali worked without:

           reserved-memory {
        #address-cells = <1>;
        #size-cells = <1>;
        ranges;

        cma: cma {
            compatible = "shared-dma-pool";
            size = <0x4000000>;
            reusable;
        };
    };

So i added the cma reserved-memory on 4.19 to fix the issue, still freezes. Can you give some suggestion to fix that or see what is wrong with my setup?

I can get information about the driver but freeze when rendering on fbdev.

Here is some log:

[    0.000000] Reserved memory: created CMA memory pool at 0x59c00000, size 64 MiB
[    0.000000] OF: reserved mem: initialized node cma, compatible id shared-dma-pool
[    0.000000] cma: Reserved 128 MiB at 0x51c00000
[    0.000000] Memory: 308548K/524288K available (8192K kernel code, 340K rwdata, 1948K rodata, 1024K init, 266K bss, 19132K reserved, 196608K cma-reserved, 0K highmem)
[   10.360460] platform mali-utgard: assigned reserved memory node cma
[   10.352566] mali: loading out-of-tree module taints kernel.
[   10.360460] platform mali-utgard: assigned reserved memory node cma
[   10.362521] Allwinner sunXi mali glue initialized

complete log: https://gist.github.com/avafinger/5beea7b1b7d58847b299329f957d2333

cuu commented 5 years ago

did you use xf86-video-armsoc on 4.18.y ?

avafinger commented 5 years ago

No, it is mali framebuffer (fbdev).

net147 commented 5 years ago

Does it freeze immediately when starting an application that uses Mali or does it work for some time and then freeze?

avafinger commented 5 years ago

Freeze up immediately. The board needs a power cycle to boot again.

net147 commented 5 years ago

Mali (GBM) working fine for me on A20 with 4.19.2 kernel

avafinger commented 5 years ago

Ok, will move on to 4.19.2 and check again, Thanks.

net147 commented 5 years ago

I didn't have to add cma reserved-memory to the device tree. I just added cma=256M to kernel command line so the GPU doesn't run out of memory for more heavy graphical applications.

avafinger commented 5 years ago

Tested with stable kernel 4.19.2 and same thing. On kernel 4.18.19 mali works just fine. Maybe something related to Mali userspace lib? I just switched kernel version.

mripard commented 5 years ago

You'll need this patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4be9bd10e22dfc7fc101c5cf5969ef2d3a042d8a, along with DRM_FBDEV_LEAK_PHYS_SMEM set in your configuration, and drm_kms_helper.drm_leak_fbdev_smem being set on the kernel command line

avafinger commented 5 years ago

@mripard Looks like this patch would work only on 4.20-rc2 ? Anything I could do to 4.19.y?

avafinger commented 5 years ago

Just for the record, i grabbed mainline 4.20-rc2, applied the patch and in kernell command line i have drm_leak_fbdev_smem=1, this time kernel does no freeze up but no EglContext could be created:

EGL Version: "1.4 Linux-r8p1-00rel0"
EGL Vendor: "ARM"
EGL Extensions: "EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_reusable_sync EGL_KHR_fence_sync EGL_KHR_lock_surface EGL_KHR_lock_surface2 EGL_EXT_create_context_robustness EGL_ANDROID_blob_cache EGL_KHR_create_context EGL_KHR_partial_update EGL_KHR_create_context_no_error "
Error: eglCreateContext failed: 0x00003003

By the way, sunxi-mali does not build against mainline 4.20-rc2 due to some mm changes, i then changed this:

diff --git a/r8p1/src/devicedrv/mali/linux/mali_kernel_linux.h b/r8p1/src/devicedrv/mali/linux/mali_kernel_linux.h
index b5c44fd..700c9cc 100644
--- a/r8p1/src/devicedrv/mali/linux/mali_kernel_linux.h
+++ b/r8p1/src/devicedrv/mali/linux/mali_kernel_linux.h
@@ -29,6 +29,10 @@ extern struct platform_device *mali_platform_device;
 #define CONFIG_PM_RUNTIME 1
 #endif

+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 13, 0)
+#define __GFP_REPEAT __GFP_RETRY_MAYFAIL
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/r8p1/src/devicedrv/mali/linux/mali_memory_block_alloc.c b/r8p1/src/devicedrv/mali/linux/mali_memory_block_alloc.c
index c6ffd40..bbb482f 100644
--- a/r8p1/src/devicedrv/mali/linux/mali_memory_block_alloc.c
+++ b/r8p1/src/devicedrv/mali/linux/mali_memory_block_alloc.c
@@ -309,7 +309,7 @@ int mali_mem_block_cpu_map(mali_mem_backend *mem_bkend, struct vm_area_struct *v

    list_for_each_entry(m_page, &block_mem->pfns, list) {
        MALI_DEBUG_ASSERT(m_page->type == MALI_PAGE_NODE_BLOCK);
-       ret = vm_insert_pfn(vma, addr, _mali_page_node_get_pfn(m_page));
+       ret = vmf_insert_pfn(vma, addr, _mali_page_node_get_pfn(m_page));

        if (unlikely(0 != ret)) {
            return -EFAULT;
diff --git a/r8p1/src/devicedrv/mali/linux/mali_memory_cow.c b/r8p1/src/devicedrv/mali/linux/mali_memory_cow.c
index 827458f..fcdb356 100644
--- a/r8p1/src/devicedrv/mali/linux/mali_memory_cow.c
+++ b/r8p1/src/devicedrv/mali/linux/mali_memory_cow.c
@@ -532,7 +532,7 @@ int mali_mem_cow_cpu_map(mali_mem_backend *mem_bkend, struct vm_area_struct *vma
         * flush which makes it way slower than remap_pfn_range or vm_insert_pfn.
        ret = vm_insert_page(vma, addr, page);
        */
-       ret = vm_insert_pfn(vma, addr, _mali_page_node_get_pfn(m_page));
+       ret = vmf_insert_pfn(vma, addr, _mali_page_node_get_pfn(m_page));

        if (unlikely(0 != ret)) {
            return ret;
@@ -569,7 +569,7 @@ _mali_osk_errcode_t mali_mem_cow_cpu_map_pages_locked(mali_mem_backend *mem_bken

    list_for_each_entry(m_page, &cow->pages, list) {
        if ((count >= offset) && (count < offset + num)) {
-           ret = vm_insert_pfn(vma, vaddr, _mali_page_node_get_pfn(m_page));
+           ret = vmf_insert_pfn(vma, vaddr, _mali_page_node_get_pfn(m_page));

            if (unlikely(0 != ret)) {
                if (count == offset) {
diff --git a/r8p1/src/devicedrv/mali/linux/mali_memory_os_alloc.c b/r8p1/src/devicedrv/mali/linux/mali_memory_os_alloc.c
index 5fe1270..bcfdd41 100644
--- a/r8p1/src/devicedrv/mali/linux/mali_memory_os_alloc.c
+++ b/r8p1/src/devicedrv/mali/linux/mali_memory_os_alloc.c
@@ -202,7 +202,7 @@ int mali_mem_os_alloc_pages(mali_mem_os_mem *os_mem, u32 size)
    /* Allocate new pages, if needed. */
    for (i = 0; i < remaining; i++) {
        dma_addr_t dma_addr;
-       gfp_t flags = __GFP_ZERO | __GFP_REPEAT | __GFP_NOWARN | __GFP_COLD;
+       gfp_t flags = __GFP_ZERO | __GFP_REPEAT | __GFP_NOWARN;
        int err;

 #if defined(CONFIG_ARM) && !defined(CONFIG_ARM_LPAE)
@@ -370,7 +370,7 @@ int mali_mem_os_cpu_map(mali_mem_backend *mem_bkend, struct vm_area_struct *vma)
        ret = vm_insert_page(vma, addr, page);
        */
        page = m_page->page;
-       ret = vm_insert_pfn(vma, addr, page_to_pfn(page));
+       ret = vmf_insert_pfn(vma, addr, page_to_pfn(page));

        if (unlikely(0 != ret)) {
            return -EFAULT;
@@ -408,7 +408,7 @@ _mali_osk_errcode_t mali_mem_os_resize_cpu_map_locked(mali_mem_backend *mem_bken

            vm_end -= _MALI_OSK_MALI_PAGE_SIZE;
            if (mapping_page_num > 0) {
-               ret = vm_insert_pfn(vma, vm_end, page_to_pfn(m_page->page));
+               ret = vmf_insert_pfn(vma, vm_end, page_to_pfn(m_page->page));

                if (unlikely(0 != ret)) {
                    /*will return -EBUSY If the page has already been mapped into table, but it's OK*/
@@ -431,7 +431,7 @@ _mali_osk_errcode_t mali_mem_os_resize_cpu_map_locked(mali_mem_backend *mem_bken
        list_for_each_entry(m_page, &os_mem->pages, list) {
            if (count >= offset) {

-               ret = vm_insert_pfn(vma, vstart, page_to_pfn(m_page->page));
+               ret = vmf_insert_pfn(vma, vstart, page_to_pfn(m_page->page));

                if (unlikely(0 != ret)) {
                    /*will return -EBUSY If the page has already been mapped into table, but it's OK*/
diff --git a/r8p1/src/devicedrv/mali/linux/mali_memory_secure.c b/r8p1/src/devicedrv/mali/linux/mali_memory_secure.c
index 2836b1b..a45287d 100644
--- a/r8p1/src/devicedrv/mali/linux/mali_memory_secure.c
+++ b/r8p1/src/devicedrv/mali/linux/mali_memory_secure.c
@@ -15,6 +15,9 @@
 #include <linux/mutex.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-buf.h>
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 16, 0)
+#include <linux/dma-direct.h>
+#endif

 _mali_osk_errcode_t mali_mem_secure_attach_dma_buf(mali_mem_secure *secure_mem, u32 size, int mem_fd)
 {
@@ -128,7 +131,7 @@ int mali_mem_secure_cpu_map(mali_mem_backend *mem_bkend, struct vm_area_struct *
        MALI_DEBUG_ASSERT(0 == size % _MALI_OSK_MALI_PAGE_SIZE);

        for (j = 0; j < size / _MALI_OSK_MALI_PAGE_SIZE; j++) {
-           ret = vm_insert_pfn(vma, addr, PFN_DOWN(phys));
+           ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys));

            if (unlikely(0 != ret)) {
                return -EFAULT;
diff --git a/r8p1/src/devicedrv/mali/linux/mali_memory_swap_alloc.c b/r8p1/src/devicedrv/mali/linux/mali_memory_swap_alloc.c
index a54faca..012cfe1 100644
--- a/r8p1/src/devicedrv/mali/linux/mali_memory_swap_alloc.c
+++ b/r8p1/src/devicedrv/mali/linux/mali_memory_swap_alloc.c
@@ -248,7 +248,11 @@ static void mali_mem_swap_swapped_bkend_pool_shrink(_mali_mem_swap_pool_shrink_t
    }

    /* Get system free pages number. */
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 14, 0)
+   system_free_size = global_zone_page_state(NR_FREE_PAGES) * PAGE_SIZE;
+#else
    system_free_size = global_page_state(NR_FREE_PAGES) * PAGE_SIZE;
+#endif
    last_gpu_utilization = _mali_ukk_utilization_gp_pp();

    if ((last_gpu_utilization < gpu_utilization_threshold_value)
@@ -576,7 +580,11 @@ int mali_mem_swap_alloc_pages(mali_mem_swap *swap_mem, u32 size, u32 *bkend_idx)
        list_add_tail(&m_page->list, &swap_mem->pages);
    }

+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 14, 0)
+   system_free_size = global_zone_page_state(NR_FREE_PAGES) * PAGE_SIZE;
+#else
    system_free_size = global_page_state(NR_FREE_PAGES) * PAGE_SIZE;
+#endif

    if ((system_free_size < mali_mem_swap_out_threshold_value)
        && (mem_backend_swapped_pool_size > (mali_mem_swap_out_threshold_value >> 2))
diff --git a/r8p1/src/devicedrv/mali/linux/mali_osk_timers.c b/r8p1/src/devicedrv/mali/linux/mali_osk_timers.c
index e5d7238..701051a 100644
--- a/r8p1/src/devicedrv/mali/linux/mali_osk_timers.c
+++ b/r8p1/src/devicedrv/mali/linux/mali_osk_timers.c
@@ -18,16 +18,25 @@
 #include "mali_osk.h"
 #include "mali_kernel_common.h"

-struct _mali_osk_timer_t_struct {
-   struct timer_list timer;
-};
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 14, 0)
+
+#define TIMER_DATA_TYPE        unsigned long
+#define TIMER_FUNC_TYPE        void (*)(TIMER_DATA_TYPE)
+
+static inline void timer_setup(struct timer_list *timer,
+                  void (*callback)(struct timer_list *),
+                  unsigned int flags)
+{
+   __setup_timer(timer, (TIMER_FUNC_TYPE)callback,
+             (TIMER_DATA_TYPE)timer, flags);
+}
+#endif

 typedef void (*timer_timeout_function_t)(unsigned long);

 _mali_osk_timer_t *_mali_osk_timer_init(void)
 {
    _mali_osk_timer_t *t = (_mali_osk_timer_t *)kmalloc(sizeof(_mali_osk_timer_t), GFP_KERNEL);
-   if (NULL != t) init_timer(&t->timer);
    return t;
 }

@@ -65,8 +74,7 @@ mali_bool _mali_osk_timer_pending(_mali_osk_timer_t *tim)
 void _mali_osk_timer_setcallback(_mali_osk_timer_t *tim, _mali_osk_timer_callback_t callback, void *data)
 {
    MALI_DEBUG_ASSERT_POINTER(tim);
-   tim->timer.data = (unsigned long)data;
-   tim->timer.function = (timer_timeout_function_t)callback;
+   timer_setup(&tim->timer, callback, 0);
 }

 void _mali_osk_timer_term(_mali_osk_timer_t *tim)

I also had to back port some mm function that was droped or missing on mainline kernel 4.20-rc2 but unfortunately this does not worked out.

Bootlog: https://gist.github.com/avafinger/e23783333910f25523cd2440d1444207

Modules:

Module                  Size  Used by
ov5640                 32768  0
v4l2_fwnode            20480  1 ov5640
v4l2_common            16384  1 ov5640
videodev              143360  3 ov5640,v4l2_fwnode,v4l2_common
media                  28672  2 ov5640,videodev
sunxi_cir              16384  0
mali                  229376  0
hci_uart               36864  0
btintel                16384  1 hci_uart
bluetooth             327680  2 hci_uart,btintel
ecdh_generic           28672  1 bluetooth
brcmfmac              188416  0
brcmutil               16384  1 brcmfmac
g_serial               16384  0
ipv6                  401408  18

@mripard Do you have a kernel and sunxi-linux up to date i could try out?

mansr commented 5 years ago

I have the same problem (instant hang, watchdog reset) here on kernel 4.19.4.

mansr commented 5 years ago

Got it working on 4.20-rc4. No idea which commit(s) fixed it.

avafinger commented 5 years ago

@mansr No luck here. Still the same on 4.20-rc4. But since I had to change a few things to be able to compile mali on 4.20.y i may have broken something.

Can you share from where you pull your kernel and your changes?

mansr commented 5 years ago

This works: https://github.com/mansr/sunxi-mali/commit/63f6d23a5fc23ad4498e995a4b59f752c748bc34

avafinger commented 5 years ago

Are you using A20? r6p2 ? I use r8p1 (fbdev) on H2+.

Thanks, will re-check again.

avafinger commented 5 years ago

The only difference i have is on _mali_osk_boot_time_get_ns() I ported the 32 to 4.20.y . I will use your instead and try again.

mansr commented 5 years ago

Yes, A20 with r6p2.

mansr commented 5 years ago

r8p1 also works.

avafinger commented 5 years ago

yes, it is working. I removed reserved-memory from my dts as i did not have it in 4.18.y.

Thank you!

mansr commented 5 years ago

It's still broken on 4.19. I'd like to see that fixed, since 4.19 is an LTS release.

avafinger commented 5 years ago

I will give a try on 4.19 this weekend if works i close then.

avafinger commented 5 years ago

Ok, i arranged a time and gave 4.19.5 a try with the back-port of CONFIG_DRM_FBDEV_LEAK_PHYS_SMEM and some minor changes to fbcon.c / fbmem.c / fbmon.c to be able to compile without error with reserved-memory removed.

It works.

mansr commented 5 years ago

Why is that patch needed with 4.19? I thought the hiding of the physical address was only in 4.20.

avafinger commented 5 years ago

It is beyond my understanding, i just followed the maxim's recommendation (please see previous post):

You'll need this patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4be9bd10e22dfc7fc101c5cf5969ef2d3a042d8a, along with DRM_FBDEV_LEAK_PHYS_SMEM set in your configuration, and drm_kms_helper.drm_leak_fbdev_smem being set on the kernel command line

mansr commented 5 years ago

Could you share exactly what you changed?

avafinger commented 5 years ago

Here is the changes, i had to do it manually. Please, note the patched tree is: linux-4.19.1

mali-fix-kernel-4.19.5.patch.zip

avafinger commented 5 years ago

and i removed:

/*
reserved-memory {
    #address-cells = <1>;
    #size-cells = <1>;
    ranges;

    cma_pool: cma@4a000000 {
        compatible = "shared-dma-pool";
        size = <0x6000000>;
        alloc-ranges = <0x4a000000 0x6000000>;
        reusable;
        linux,cma-default;
    };
};
  */
avafinger commented 5 years ago

And sunxi-mali r8p1 sunxi-mali.patch.zip

giuliobenetti commented 5 years ago

Hello @mansr,

are you planning to open a Pull Request for your patches to build against 4.20?

mansr commented 5 years ago

Still no luck with 4.19. As soon as I start a GL application, some memory is corrupted and everything crashes.

avafinger commented 5 years ago

I tested with this: https://github.com/avafinger/mali-fbdev-stress-test-tools

Can you share your GL application? I run it here and see if it works.

mansr commented 5 years ago

I was using https://github.com/smk-embedded/qt5-opengles2-test for testing, mostly because it has an OE recipe and doesn't need libgbm. Trying the yagears demo, all I get is eglGetDisplay failed: 0x3000.

avafinger commented 5 years ago

Did you try removing cma reserved-memory?

mansr commented 5 years ago

Tried both with and without, no difference. Also on kernel 4.20 which otherwise works.

avafinger commented 5 years ago

Regarding my patches, did you have to change anything else to be able to compile?

mansr commented 5 years ago

The gears demo is now working on 4.20 (not sure what I did wrong). Instant crash on 4.19.

avafinger commented 5 years ago

Mali (GBM) working fine for me on A20 with 4.19.2 kernel

@net147 Can you please share where you got the mali 400 gbm blobs? bootlin provides mali fbdev blobs.

net147 commented 5 years ago

@avafinger I am using the r6p2 Wayland Mali blob. It includes GBM support. You can use it without Wayland API or compositor running, just the Wayland libraries need to be present to satisfy the shared library dependencies.

avafinger commented 5 years ago

That makes sense. Thanks.

sergey-suloev commented 5 years ago

Wayland blob works fine.

пн, 31 дек. 2018 г., 2:11 avafinger notifications@github.com:

That makes sense. Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mripard/sunxi-mali/issues/54#issuecomment-450593183, or mute the thread https://github.com/notifications/unsubscribe-auth/AF_Ry27YEnCaqdLUq-8LOzbXQl-6OdpMks5u-UgHgaJpZM4YEISQ .

mripard commented 5 years ago

That regression (hard crash as soon as we use a mali with fbdev) was introduced in 4.19, but the fix went into 4.20. I submitted that patch to stable yesterday, so it should come up in one of the next 4.19.X release, hopefully.

nanfang2000 commented 4 years ago

That regression (hard crash as soon as we use a mali with fbdev) was introduced in 4.19, but the fix went into 4.20. I submitted that patch to stable yesterday, so it should come up in one of the next 4.19.X release, hopefully.

Hi @mripard , I encounter same issue when run malitest. I'm now using kernel 5.5. Should I need to apply any patch or fix for 5.5? Thanks!