intel / KVMGT-kernel

Other
41 stars 20 forks source link

Possible bug (master interrupt disable) #17

Open JaeyongYoo opened 9 years ago

JaeyongYoo commented 9 years ago

I've sometimes got into GPU master interrupt disable (screen freezes while shell is still alive).

While some investigation, I think there is a race condition between vgt_interrupt and guest interrupt handlers. Here is the following:

l1viathan commented 9 years ago

Hi Jaeyong, do you have commit c17829b9e ("vgt host mediation: force a serialized execution of host ISR") applied?

JaeyongYoo commented 9 years ago

Yes it is applied. The race is not between host ISR called from multiple cores. The race is between host ISR and vgt_interrupt.

l1viathan commented 9 years ago

Thanks for the information.

In case of the host ISR, it actually only affects the vreg - and in the emulation path, we have the pdev locked( vgt_lock_dev_flags called by vgt_emulate_write/read), that should ensure the protection between host ISR and vgt ISR, at least in theory :)

Would you tell us that workoad you are running, how many VMs, how many CPUs for those VMs, and I'll try to reproduce the BUG locally.

JaeyongYoo commented 9 years ago

In case of the host ISR, it actually only affects the vreg - and in the emulation path, we have the pdev locked( vgt_lock_dev_flags called by vgt_emulate_write/read), that should ensure the protection between host ISR and vgt ISR, at least in theory :)

In case of DE_IER access, it writes to physical DE_IER register in emulation path. And, even though emulation path is protected by pdev lock, it is only protecting a read/write of a single register. In case of interrupt handler, three accesses (read/write) should be protected as a atomic operation. Here are the detail with some pseudo code.

In vgt-interrupt handler

(1) deier_tmp = READ(DEIER); (2) WRITE(DEIER, deier_tmp & master_interrupt_disable) ... do some work (3) WRITE(DEIER, deier_tmp) <-- here, it expects master interrupt is enabled here.

In host ISR, similar thing:

(4) deier_tmp = READ(DEIER); // even though it is accessing vreg, it eventually accesses preg in emulation path (5) WRITE(DEIER, deier_tmp & master_interrupt_disable) ... do some work (6) WRITE(DEIER, deier_tmp) <-- here, it expects master interrupt is enabled here.

Since host ISR and vgt interrupt is not synchronized, it can be executed in parallel. The problem happens if it got executed in this sequence.

(4) -> (5) -> (1) -> (2) -> (6) -> (3),

With the above sequence, (1, 2) saves the disabled DEIER into vgt's tmp_deier, and restores this disabled tmp_deier at (3). Thus, after (3), the master interrupt becomes disabled and never bring back enabled again.

Would you tell us that workoad you are running, how many VMs, how many CPUs for those VMs, and I'll try to reproduce the BUG locally.

I'm working on my graphics work and not possible to share the workload for now. I'm sorry about this...

l1viathan commented 9 years ago

@JaeyongYoo : Thanks for the analysis.

So it seems there was a race condition, do you see this issue each time or randomly? How about simply enabling the master interrupt before returning from vgt_interrupt?