intel / KVMGT-kernel

Other
41 stars 20 forks source link

Question about gpu context switching error #16

Open terry84 opened 9 years ago

terry84 commented 9 years ago

Hello.

I have a question. Could you tell me what below error message mean? It seems this error is occured during gpu context switching between virtual machines with MI_SET_CONTEXT command.

[ 109.802566] [kvmgt] kvmgt_read_hva-33: copy_from_user failed: rc == 4, len == 4 [ 110.353158] vGT error:(ring_wait_for_completion:137) Timeout 500 ms for CMD comletion on ring 0 [ 110.362514] vGT error:(ring_wait_for_completion:138) expected(611), actual(610) [ 110.370384] vGT error:(vgt_restore_hw_context:1551) change to VM context switch commands unfinished [ 110.380117] vGT error:(vgt_do_render_context_switch:1714) Fail to restore context [ 110.388163] vGT info:(dump_regs_on_err:1598) reg=0x2054, val=0xa [ 110.394650] vGT info:(dump_regs_on_err:1598) reg=0x12054, val=0xa [ 110.401212] vGT info:(dump_regs_on_err:1598) reg=0x22054, val=0xa [ 110.407773] vGT info:(dump_regs_on_err:1598) reg=0x1a054, val=0xa [ 110.414331] vGT info:(dump_regs_on_err:1598) reg=0xa098, val=0x3e80000 [ 110.421343] vGT info:(dump_regs_on_err:1598) reg=0xa09c, val=0x28001e [ 110.428277] vGT info:(dump_regs_on_err:1598) reg=0xa0a8, val=0x1e848 [ 110.435140] vGT info:(dump_regs_on_err:1598) reg=0xa0ac, val=0x19 [ 110.441690] vGT info:(dump_regs_on_err:1598) reg=0xa0b4, val=0x3e8 [ 110.448333] vGT info:(dump_regs_on_err:1598) reg=0xa0b8, val=0xc350 [ 110.455073] vGT info:(dump_regs_on_err:1598) reg=0xa090, val=0x88040000 [ 110.462209] vGT info:(dump_regs_on_err:1598) reg=0xa094, val=0x0 [ 110.468693] vGT error:(vgt_do_render_context_switch:1779) Ring-0: (3359th checks 204th switch<1->0>) [ 110.478533] vGT error:(vgt_do_render_context_switch:1780) FAIL on ring-0 [ 110.485755] vGT error:(vgt_do_render_context_switch:1785) cur(1): head(1c2d8), tail(1c2d8), start(7801000) [ 110.496216] vGT error:(vgt_do_render_contextswitch:1790) dom0(0): head(416660), tail(168f8), start(14b000) [ 110.506668] VM0 :head(416660), tail(168f8), start(14b000), ctl(1f001), uhptr(0) [ 110.514752] VM1():head(1c2d8), tail(1c2d8), start(7801000), ctl(1f001), uhptr(0) [ 110.522907] debug registers,reg maked with <_> may not apply to every ring): [ 110.530491] ....RING_EIR: 00000000 [ 110.534168] ....RING_EMR: ffffffff [ 110.537826] ....RING_ESR: 00000000 [ 110.541500] ....00002068: 780c0000 [ 110.545167] ....INSTPS* (parser state): 00000500 : [ 110.550315] ....ACTHD(active header): 000000b0 [00000090]: 00000262 00000000 00000000 00000000 00000000 04000000 0c000000 0008c10e [000000b0]: 00000000(*) 04000001 6d800005 00000004 00000000 00000000 00000000 00000000 [000000d0]: 00000000 00000000 10400002 00000000 0f800000 00000263 00000000 00000000 [000000f0]: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [00000110]: 78100004 00000000 80010000 00000000 00000000 00000000 781b0005 00010023 [ 110.668942] vGT error:(vgt_thread:309) Hang in context switch, try to reset device.

Especially, I want to know about these error codes. I can't find any references about these registers.

[ 110.388163] vGT info:(dump_regs_on_err:1598) reg=0x2054, val=0xa [ 110.394650] vGT info:(dump_regs_on_err:1598) reg=0x12054, val=0xa [ 110.401212] vGT info:(dump_regs_on_err:1598) reg=0x22054, val=0xa [ 110.407773] vGT info:(dump_regs_on_err:1598) reg=0x1a054, val=0xa [ 110.414331] vGT info:(dump_regs_on_err:1598) reg=0xa098, val=0x3e80000 [ 110.421343] vGT info:(dump_regs_on_err:1598) reg=0xa09c, val=0x28001e [ 110.428277] vGT info:(dump_regs_on_err:1598) reg=0xa0a8, val=0x1e848 [ 110.435140] vGT info:(dump_regs_on_err:1598) reg=0xa0ac, val=0x19 [ 110.441690] vGT info:(dump_regs_on_err:1598) reg=0xa0b4, val=0x3e8 [ 110.448333] vGT info:(dump_regs_on_err:1598) reg=0xa0b8, val=0xc350 [ 110.455073] vGT info:(dump_regs_on_err:1598) reg=0xa090, val=0x88040000 [ 110.462209] vGT info:(dump_regs_on_err:1598) reg=0xa094, val=0x0

Thank you in advance :)

l1viathan commented 9 years ago

Did you meet this every time, or just once?

To me, it's looks suspicious that, the user address access failed. This was possibly caused by swapping, would you please add "-realtime mlock=on" to qemu command and have a try?

I can't find any references about these registers

Are you referring to the opensource PRM of GEN?