xen-troops / meta-xt-prod-devel-rcar

Main Xen Troops product, which is used for day-to-day development and integration
Apache License 2.0
4 stars 14 forks source link

[XEN] Unable to inject GSX IRQ #106

Open naveenc-93 opened 1 year ago

naveenc-93 commented 1 year ago

Hello Team,

I am facing below IRQ issue with GSX driver.

(XEN) [ 20.059538] Failed to inject GSX IRQ (XEN) [ 20.276373] common/grant_table.c:1882:d2v1 Expanding d2 grant table from 1 to 2 frames (XEN) [ 20.284888] common/grant_table.c:1882:d2v1 Expanding d2 grant table from 2 to 3 frames (XEN) [ 20.293334] common/grant_table.c:1882:d2v1 Expanding d2 grant table from 3 to 4 frames (XEN) [ 20.301751] common/grant_table.c:1882:d2v1 Expanding d2 grant table from 4 to 5 frames (XEN) [ 20.496305] gnttab_mark_dirty not implemented yet (XEN) [ 22.554993] Failed to inject GSX IRQ (XEN) [ 22.570198] Failed to inject GSX IRQ

Xen is unable to inject GSX IRQ. I have enabled below options for GSX in configuration file,.

dt_passthrough_nodes = [ "/gsx_opp_table0", "/gsx_opp_table1", "/gsx_opp_table2", "/gsx_opp_table3", "/gsx_opp_table4", "/gsx_opp_table5", "/gsx_opp_table6", "/gsx_opp_table7", ]

dt_dev = [ "/soc/gsx_pv0_domd", "/soc/gsx_pv1_domd", "/soc/gsx_pv2_domd", "/soc/gsx_pv3_domd", ]

irqs = [

gsx@fd000000

151,

]

iomem = [

gsx@fd000000

"fd000,40",

]

could you please help me on how to resolve this issue??

naveenc-93 commented 1 year ago

I have also added "pvrsrvkm.DriverMode=0" in domd bootargs

lorc commented 1 year ago

Hi @naveenc-93, are you using our default config? Also, what machine do use?

I am asking because it should work out of the box without any changes.

lorc commented 1 year ago

@otyshchenko1, may I ask you to take a quick look? I just don't remember what is the correct configuration for GSX.

naveenc-93 commented 1 year ago

Yes, I am using the same config but machine is our custom board which is based on H3 starter kit.

lorc commented 1 year ago

Ah, I see. Well, I am not sure that problem is with DomD. Looking at your logs, you already booted up DomU, so maybe Xen can't inject IRQs there? Sadly, it is unclear into which domain it can't inject IRQs. Are you seeing those messages before domain with ID=2 created?

naveenc-93 commented 1 year ago

Yes, Even it is unclear for which domain Xen is unable to inject IRQ but I observe these messages during when a graphics application is being run in domd

naveenc-93 commented 1 year ago

I also got this dump where error says "interrupts were not received in domd"

[   12.533951] PVR_K:(Error):   522: RGXUpdateHealthStatus: LISR has not received the last 3 interrupts [5760]
[   12.534037] PVR_K:(Error):   522:  Device experienced error 15 [105]
[   12.534069] PVR_K:(Error):   522: DevicesWatchdogThread: Device status not OK!!! [499]
[   12.534107] PVR_K:  522: ------------[ PVR DBG: START (High) ]------------
[   12.534138] PVR_K:  522: OS kernel info: Linux 5.10.41-yocto-standard #1 SMP PREEMPT Wed May 10 06:26:40 UTC 2023 aarch64
[   12.534181] PVR_K:  522: DDK info: Rogue_DDK_Linux rogueddk 1.15@6052913 (release) r8a7795_linux
[   12.534218] PVR_K:  522: Time now: 12534215us
[   12.534242] PVR_K:  522: Services State: OK
[   12.534263] PVR_K:  522: Server Errors: 3
[   12.534292] PVR_K:  522: Connections Device ID:0(128) P800-V800-T800-agl-compositor, P804-V804-T804-launcher, P803-V803-T803-homescreen
[   12.534342] PVR_K:  522: ------[ Driver Info ]------
[   12.534370] PVR_K:  522: Comparison of UM/KM components: MATCHING
[   12.534398] PVR_K:  522: KM Arch: 64 Bit
[   12.534418] PVR_K:  522: UM Connected Clients: 64 Bit
[   12.534444] PVR_K:  522: UM info: 1.15 @  6052913 (release) build options: 0x80000810
[   12.534481] PVR_K:  522: KM info: 1.15 @  6052913 (release) build options: 0x00000810
[   12.534514] PVR_K:  522: Window system: nullws_drm
[   12.534546] PVR_K:  522: ------[ RGX Device ID:0 Start ]------
[   12.534575] PVR_K:  522: ------[ RGX Info ]------
[   12.534613] PVR_K:  522: Device Node (Info): 00000000c86ec4a1 (000000008c71d9bd)
[   12.534649] PVR_K:  522: RGX BVNC: 4.46.6.62 (rogue)
[   12.534675] PVR_K:  522: RGX Device State: Active
[   12.534699] PVR_K:  522: RGX Power State: ON
[   12.534726] PVR_K:  522: FW info: 1.15 @  6052913 (release) build options: 0x80000810
[   12.534764] PVR_K:  522: BIF0 - OK
[   12.534785] PVR_K:  522: BIF1 - OK
[   12.534806] PVR_K:  522: TEXAS_BIF - OK
[   12.534827] PVR_K:  522: TEXAS_BIF - OK
[   12.534849] PVR_K:  522: RGX Virtualisation firmware connection state: UP (Fw=active; OS=active)
[   12.534891] PVR_K:  522: RGX FW State: NOT RESPONDING - Missing interrupts (HWRState 0x00000001: HWR OK;)
[   12.534939] PVR_K:  522: RGX FW Power State: RGXFWIF_POW_IDLE (APM disabled: 0 ok, 0 denied, 0 non-idle, 0 retry, 0 other, 0 total. Latency: 100 ms)
[   12.534996] PVR_K:  522: RGX DVFS: 0 frequency changes. Current frequency: 599.997 MHz (sampled at 7872203376 ns). FW frequency: 600.000 MHz.
[   12.535061] PVR_K:  522: RGX FW OS 0 - State: active; Freelists: Ok; Priority: 0; MTS on;
[   12.535111] Unable to handle kernel paging request at virtual address ffff8000122b502c
[   12.535143] Mem abort info:
[   12.535158]   ESR = 0x96000021
[   12.535179]   EC = 0x25: DABT (current EL), IL = 32 bits
[   12.535204]   SET = 0, FnV = 0
[   12.535223]   EA = 0, S1PTW = 0
[   12.535241] Data abort info:
[   12.535260]   ISV = 0, ISS = 0x00000021
[   12.535281]   CM = 0, WnR = 0
[   12.535301] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000415db000
[   12.535329] [ffff8000122b502c] pgd=00000000bffff003, p4d=00000000bffff003, pud=00000000bfffe003, pmd=000000004332e003, pte=006800004792e713
[   12.535393] Internal error: Oops: 96000021 [#1] PREEMPT SMP
[   12.535419] Modules linked in: xt_MASQUERADE iptable_filter can_raw can iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_tcpudp bridge stp llc cfg80211 rfkill pvrsrvkm(O) crct10dif_ce dw_hdmi_cec smsc95xx usbnet renesas_rpc_if rcar_can can_dev renesas_usb3 ccree authenc libdes vspm_if(O) vsp2(O) vspm(O) uio_pdrv_genirq sllin(O) mmngrbuf(O) mmngr(O) fuse ip_tables x_tables ipv6
[   12.535644] CPU: 2 PID: 522 Comm: pvr_device_wdg Tainted: G        W  O      5.10.41-yocto-standard #1
[   12.535683] Hardware name: XENVM-4.18 (DT)
[   12.535706] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   12.535832] pc : RGXDumpRGXDebugSummary+0x2a8/0xf60 [pvrsrvkm]
[   12.535918] lr : RGXDumpRGXDebugSummary+0x82c/0xf60 [pvrsrvkm]
[   12.535947] sp : ffff8000230037a0
[   12.535967] x29: ffff8000230037c0 x28: 0000000000000000
[   12.535994] x27: 0000000000000001 x26: ffff800009179838
[   12.536022] x25: ffff800009198d88 x24: 0000000000000000
[   12.536050] x23: ffff8000122a5600 x22: ffff8000122b5000
[   12.536077] x21: ffff000008a9a800 x20: 0000000000000000
[   12.536104] x19: 0000000000000000 x18: ffffffffffffffff
[   12.536132] x17: 00000000000012c0 x16: 00000000deadbeef
[   12.536159] x15: ffff8000a30033e7 x14: 3020534f20574620
[   12.536186] x13: ffff8000119cb6e8 x12: 000000000000064b
[   12.536214] x11: ffff800009199088 x10: ffff8000119cb6e8
[   12.536240] x9 : 0000000000000001 x8 : 00000000ffffefff
[   12.536268] x7 : ffff800011a236e8 x6 : ffff800011a236e8
[   12.536295] x5 : ffff00007fb67930 x4 : 0000000000000000
[   12.536322] x3 : 0000000000000000 x2 : 000000000000508a
[   12.536350] x1 : 0000000000000001 x0 : ffff8000122b5004
[   12.536377] Call trace:
[   12.536453]  RGXDumpRGXDebugSummary+0x2a8/0xf60 [pvrsrvkm]
[   12.536530]  RGXDebugRequestProcess+0x690/0x124c [pvrsrvkm]
[   12.536600]  RGXDebugRequestNotify+0x38/0x44 [pvrsrvkm]
[   12.536671]  PVRSRVDebugRequest+0x28c/0x620 [pvrsrvkm]
[   12.536743]  DevicesWatchdogThread_ForEachVaCb+0x134/0x150 [pvrsrvkm]
[   12.536816]  List_PVRSRV_DEVICE_NODE_ForEach_va+0x74/0xb0 [pvrsrvkm]
[   12.536888]  DevicesWatchdogThread+0x140/0x21c [pvrsrvkm]
[   12.536960]  OSThreadRun+0x24/0x60 [pvrsrvkm]
[   12.536993]  kthread+0x158/0x160
[   12.537018]  ret_from_fork+0x10/0x30
[   12.537042] Code: 8b1b0ac0 2a1b03e1 f94006a2 52800029 (f9401418)
[   12.537074] ---[ end trace ce321bd72b372c53 ]---

I also checked "cat /proc/interrupts" to check interrupts recieved by pvrsrvkm module. it shows 0.

lorc commented 1 year ago

Well, this is interesting issue.

Could you please confirm that you are seeing "Added GSX ..." message domain DomD creation?

otyshchenko1 commented 1 year ago

Hello @naveenc-93, in addition to what @lorc has said, could you please clarify whether the graphic DDK (both pvrum and pvrkm) you use is built with the per-OSID irq counters feature enabled? The "Failed to inject GSX IRQ" means that GSX IRQ was received, but Xen couldn't recognize the origin of that IRQ (to what domain this IRQ needs to be injected).

naveenc-93 commented 1 year ago

Hello @otyshchenkod1, We are using GSX drivers provided by Renesas. I could see ""IRQ_PER_OS" is enabled in "gaszFeaturesNoValuesNames". Also, I see below values

define RGX_FEATURE_IRQ_PER_OS_POS (14U)

define RGX_FEATURE_IRQ_PER_OS_BIT_MASK (IMG_UINT64_C(0x0000000000004000))

Is this info correct?

lorc commented 1 year ago

Just to clarify - you need a special build with virtualization feature enabled. If you have access to source code, you can use our recipes to build it. But if Renesas provided you binaries only - you need to check if they were built with the correct options.

otyshchenko1 commented 1 year ago

Hello @naveenc-93, both pvrkm and pvrum should be built with the following options. RGX_FW_IRQ_OS_COUNTERS := 1 RGX_IRQ_HYPERV_HANDLER := 1 Could you please re-check this is the case.

Could you please clarify whether you see "Added GSX d1 (OSID 0)" message during DomD creation or not? Unfortunately it is not clear from the discussion.

Also could you please post the whole system log from the very beginning (Xen + Dom0, DomD dmesg and journalctl)?

naveenc-93 commented 1 year ago

Hello @otyshchenko1 , I could see below logs from xen, (XEN) [ 0.242316] Initialized GSX IRQ (XEN) [ 11.083917] Added GSX d1 (OSID 0) (XEN) [ 22.622507] Failed to inject GSX IRQ xen_hyp.log Please find the full xen log as attached. I also looked for above two flags in GSX sources, but I couldn't find the definition. I only found reference of those variables. does it mean those aren't enabled?

otyshchenko1 commented 1 year ago

Hello @naveenc-93, thanks for the log and confirmation.

Both pvrkm and pvrum should be built with the following options: RGX_FW_IRQ_OS_COUNTERS := 1 RGX_IRQ_HYPERV_HANDLER := 1 These options should be set in the Makefile for your machine (build/linux/r8a7795_linux/Makefile). If you don't set them in your Makefile, then yes, they are in default state (disabled). The former is to save per-OSID IRQ counters in HW registers and the latter is to let the hypervisor read and clear the IRQ status register.

It is also possible to look at the build options. from your log: [ 12.534444] PVR_K: 522: UM info: 1.15 @ 6052913 (release) build options: 0x80000810 [ 12.534481] PVR_K: 522: KM info: 1.15 @ 6052913 (release) build options: 0x00000810

from my log: [ 138.565566] PVR_K: 565: UM info: 1.15 @ 6052913 (release) build options: 0x80004810
[ 138.565595] PVR_K: 565: KM info: 1.15 @ 6052913 (release) build options: 0x00004810

In my case the bit 14 which indicates the RGX_FW_IRQ_OS_COUNTERS usage is set, in your case is not, which clarifies why actually Xen continuously complains "Failed to inject GSX IRQ".

Also I didn't spot the following string from your log (but my log contains it): [ 138.565895] PVR_K: 565: RGX Virtualisation type: Hypervisor-assisted with dynamic Fw heap allocation

As @lorc has already said, you would need a special build with virtualization feature enabled.

naveenc-93 commented 1 year ago

Hello @otyshchenko1,

Thank you for the detailed information. If I understood correctly, I need to prepare a new build with the above flags enabled. I also figured out PVR number of OSes supported variable should also be set to 8 which in my case is set to 1. If we change/add these variable values in Makefile and then prepare a new build would be suffice right. I will update my observation after testing these changes.

Thanks and Regards, Naveen C

lorc commented 1 year ago

Hi @naveenc-93,

@otyshchenko1 may correct me, but looks like you are right.

Also, I want to say that we provide means of automatic building of PVR UM and PVR KM:

https://github.com/xen-troops/meta-xt-rcar/tree/master/meta-xt-rcar-proprietary

which is automatically enabled if you are calling moulin with --PREBUILT_DDK=no option.

But you will need to provide URL for your copy of DDK (see https://github.com/xen-troops/meta-xt-rcar/blob/master/meta-xt-rcar-proprietary/recipes-graphics/gles-module/gles-um-compile.bb#L8C2-L8C2)

naveenc-93 commented 1 year ago

Hello @otyshchenko1,

Renesas has provided us only sources for PVR KM and egl libs as binaries. Can I still perform a fresh build with these available options ?

lorc commented 1 year ago

Hello @naveenc-93,

This is unfortunate. You need a specific PVK UM ( EGL libs, firmware, tools, etc) build. Unfortunately we can't provide binaries to you because of NDA. You need to get those binaries from your Renesas contact.

naveenc-93 commented 1 year ago

Hello @lorc

Renesas has shared PVK UM binaries and in the kernel module, I had enabled below flags in the Makefile and built it. RGX_FW_IRQ_OS_COUNTERS := 1 RGX_IRQ_HYPERV_HANDLER := 1 Now, I face build options mismatch error.

[ 15.025143] PVR_K: 447: (FAIL) RGXDevInitCompatCheck: Mismatch in Firmware and KM driver build options; extra options present in the KM driver: (0x4000). Please check rgx_options.h
[ 15.025472] PVR_K: 447: Connections: No Devices: No active connections
[ 15.025501] PVR_K: 447: -----[ Driver Info ]-----
[ 15.025525] PVR_K: 447: Comparison of UM/KM components: MATCHING
[ 15.025552] PVR_K: 447: KM Arch: 32 Bit
[ 15.025574] PVR_K: 447: UM info: 0.0 @ 0 (debug) build options: 0x00000000
[ 15.025614] PVR_K: 447: KM info: 0.0 @ 0 (debug) build options: 0x00000000
lorc commented 1 year ago

Hi @naveenc-93,

As you can see, there is mismatch between KM and UM:

[ 15.025143] PVR_K: 447: (FAIL) RGXDevInitCompatCheck: Mismatch in Firmware and KM driver build options; extra options present in the KM driver: (0x4000). Please check rgx_options.h

Looks like you are using files from different packages.

naveenc-93 commented 1 year ago

Hello Team,

I have received updated gfx binaries of both user module and kernel module. I have built the image with new gfx binaries with virtualization support and as well as above flags enabled.

After flashing the image, I am unable to detect the drm device. weston log always outputs as:

[17:42:45.471] Output repaint window is 7 ms maximum.
[17:42:45.472] Loading module '/usr/lib/libweston-10/drm-backend.so'
[17:42:45.479] initializing drm backend
[17:42:45.479] Trying logind launcher...
[17:42:45.499] logind: session control granted
[17:42:45.503] no drm device found

I have also attached full log with xen startup and domd logs. Please help me in resolving the issue.

Thanks and Regards, Naveen C full_log.txt

naveenc-93 commented 1 year ago

Hello Team,

Could you please help me in resolving this issue?

I get below two error logs from two different set of gsx drivers.

pvr driver codeset 1: [ 5.064492] [drm] Initialized pvr 1.15.6052913 20170530 for fd000000.gsx on minor 1 pvr driver codeset 2: [ 4.121890] [drm] Initialized pvr 1.15.6052913 20170530 for fd000000.gsx on minor 0

pvr driver codeset 1 works fine and outputs HMI on hdmi display but not driver codeset 2. configuration everything looks same and the only difference observed is "minor" value. What could be the problem? does it impact ?

Thanks and Regards, Naveen C

otyshchenko1 commented 1 year ago

Hello, it is not entirely clear what is the difference in both set of gfx drivers itself (codeset 1 and codeset 2)? Could you please clarify? And please provide full system logs for both cases.

The "minor" value shows DRM minor number for device nodes in /dev. The pvr is also acts the DRM device, but it also adds a render node.

So the there are two DRM devices in the system, first is rcar-du for feb00000.display, the second is pvr for fd000000.gsx.

For example, logs from working environment: [ 1.211252] [drm] Initialized rcar-du 1.0.0 20130110 for feb00000.display on minor 0
[ 1.211286] [drm] Device feb00000.display probed
[ 1.272942] rcar-du feb00000.display: [drm] fb0: rcar-dudrmfb frame buffer device

...

[ 8.100033] [drm] Initialized pvr 1.15.6052913 20170530 for fd000000.gsx on minor 1

And the corresponding nodes in /dev:

root@salvator-x-domd:~# ls -l /dev/dri/
total 0
drwxr-xr-x 2 root root 100 Nov 28 20:39 by-path
crw-rw---- 1 root video 226, 0 Nov 28 20:39 card0
crw-rw---- 1 root video 226, 1 Nov 28 20:39 card1
crw-rw-rw- 1 root render 226, 128 Nov 28 20:39 renderD128

root@salvator-x-domd:~# ls -l /dev/dri/by-path/platform-feb00000.display-card
lrwxrwxrwx 1 root root 8 Nov 28 20:39 /dev/dri/by-path/platform-feb00000.display-card -> ../card0

root@salvator-x-domd:~# ls -l /dev/dri/by-path/platform-fd000000.gsx-card
lrwxrwxrwx 1 root root 8 Nov 28 20:39 /dev/dri/by-path/platform-fd000000.gsx-card -> ../card1

root@salvator-x-domd:~# ls -l /dev/dri/by-path/platform-fd000000.gsx-render
lrwxrwxrwx 1 root root 13 Nov 28 20:39 /dev/dri/by-path/platform-fd000000.gsx-render -> ../renderD128

The log from codeset 1 (... on minor 1) is correct, while log from codeset 0 (... on minor 0) suggests that something went wrong and rcar-du (main DRM) didn't initialized (yet?).

naveenc-93 commented 1 year ago

Hello Oleksandr,

Thank you for your reply. Both codesets are built and based on Renesas Yocto v5.9.0 BSP. codeset 1: These are the gfx km and um binaries built without these two flags enabled. RGX_FW_IRQ_OS_COUNTERS := 1 RGX_IRQ_HYPERV_HANDLER := 1 Codeset 2: These are the gfx km and um binaries build with above flags enabled.

Please find the system logs of both scenarios. log_codeset1.log log_codeset2.log

could you please help us in identifying why codeset2 binaries are initializing on minor 0 and not on minor1? I am using the same xentroops recipes for both the scenarios.

otyshchenko1 commented 1 year ago

Hello Oleksandr,

Thank you for your reply. Both codesets are built and based on Renesas Yocto v5.9.0 BSP. codeset 1: These are the gfx km and um binaries built without these two flags enabled. RGX_FW_IRQ_OS_COUNTERS := 1 RGX_IRQ_HYPERV_HANDLER := 1 Codeset 2: These are the gfx km and um binaries build with above flags enabled.

Please find the system logs of both scenarios. log_codeset1.log log_codeset2.log

could you please help us in identifying why codeset2 binaries are initializing on minor 0 and not on minor1? I am using the same xentroops recipes for both the scenarios.

Hello, could you please send the full system logs for both codesets. Saying full I mean the log starting from Xen including dom0 and domd logs (also journal output from the domd). Looks like rcar-du drm is not initialized in codeset2 scenario somehow.

naveenc-93 commented 1 year ago

Hello Oleksandr,

Unfortunately, I couldn't get the full system log from codeset1 (working scenario) case. But, I have attached the full system log from codeset2 (non working) case. Please check and help us why drm driver is getting failed log_codeset2_nw_dec1_23.log

naveenc-93 commented 11 months ago

Hello Oleksandr, Did you find any issue in the above log? Could you please help ?