shinpei0208 / gdev

First-Class GPU Resource Management: Device Drivers, Runtimes, and CUDA Compilers for Nouveau.
http://www.pdsl.jp/
MIT License
344 stars 68 forks source link

How to run multiple benchmarks simultaneously using gdev? #28

Open ilios86 opened 10 years ago

ilios86 commented 10 years ago

At first, I'm sorry if here is not appropriate place to ask my question. (please let me know, if there exists right way to ask any related questions.)

I have installed linux v3.3 on ubuntu LTS 12.04, nouveau-3.3.0 bundled in gdev source, gdev kernel module, gdev cuda driver api, and checkout gdev-bench. I read the atc'12 paper (GDEV), and i felt interests about how GDEV schedules multiple applications for single GPU device. (and Isolation among virtual GPUs)

So, i tried to launch two different benchmarks(heartwall, lud), consecutively. (to recognize what will happen when multiple benchmarks run simultaneously on single gpu device) But, both benchmarks didn't halt and dmesg log are shown as followings.


Jul 30 23:34:04 iliosserv2 kernel: [ 485.132334] [gdev] Opened gdev0 Jul 30 23:34:06 iliosserv2 kernel: [ 487.460725] sched: RT throttling activated Jul 30 23:35:59 iliosserv2 kernel: [ 600.348046] INFO: task gschedm0:2744 blocked for more than 120 seconds. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348054] gschedm0 D 0000000000000000 0 2744 2 0x00000000 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348060] ffff880067051ea0 0000000000000046 ffff88006adb4410 ffffffffa032a930 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348067] ffff88006adb4410 ffff880067051fd8 ffff880067051fd8 ffff880067051fd8 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348072] ffff88007b0d2d60 ffff88006adb4410 ffff880067051ec0 ffff8800705a2000 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348078] Call Trace: Jul 30 23:35:59 iliosserv2 kernel: [ 600.348090] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348097] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348105] [] schedule+0x3f/0x60 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348111] [] gdev_sched_mem_thread+0x5e/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348117] [] kthread+0x93/0xa0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348122] [] kernel_thread_helper+0x4/0x10 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348127] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348132] [] ? gs_change+0xb/0xb Jul 30 23:35:59 iliosserv2 kernel: [ 600.348135] INFO: task gschedc1:2747 blocked for more than 120 seconds. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348138] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348141] gschedc1 D 0000000000000000 0 2747 2 0x00000000 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348146] ffff88006851dea0 0000000000000046 ffff88007bfe0000 ffffffffa032a9c0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348151] ffff88007bfe0000 ffff88006851dfd8 ffff88006851dfd8 ffff88006851dfd8 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348156] ffff88007b0d2d60 ffff88007bfe0000 ffff88006851dec0 ffff8800705a2198 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348162] Call Trace: Jul 30 23:35:59 iliosserv2 kernel: [ 600.348168] [] ? gdev_sched_mem_thread+0x90/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348174] [] ? gdev_sched_mem_thread+0x90/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348178] [] schedule+0x3f/0x60 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348184] [] gdev_sched_com_thread+0x5e/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348188] [] kthread+0x93/0xa0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348192] [] kernel_thread_helper+0x4/0x10 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348197] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348201] [] ? gs_change+0xb/0xb Jul 30 23:35:59 iliosserv2 kernel: [ 600.348204] INFO: task gschedm1:2748 blocked for more than 120 seconds. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348207] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348210] gschedm1 D 0000000000000000 0 2748 2 0x00000000 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348214] ffff880067e13ea0 0000000000000046 ffff880071362d60 ffffffffa032a930 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348220] ffff880071362d60 ffff880067e13fd8 ffff880067e13fd8 ffff880067e13fd8 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348225] ffff88007b0d2d60 ffff880071362d60 ffff880067e13ec0 ffff8800705a2198 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348231] Call Trace: Jul 30 23:35:59 iliosserv2 kernel: [ 600.348236] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348243] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348247] [] schedule+0x3f/0x60 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348253] [] gdev_sched_mem_thread+0x5e/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348257] [] kthread+0x93/0xa0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348261] [] kernel_thread_helper+0x4/0x10 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348266] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348270] [] ? gs_change+0xb/0xb Jul 30 23:35:59 iliosserv2 kernel: [ 600.348273] INFO: task gschedc2:2751 blocked for more than 120 seconds. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348276] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348279] gschedc2 D 0000000000000000 0 2751 2 0x00000000 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348283] ffff880068413ea0 0000000000000046 ffff88007aecc410 ffffffffa032a9c0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348289] ffff88007aecc410 ffff880068413fd8 ffff880068413fd8 ffff880068413fd8 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348294] ffff88007b0d2d60 ffff88007aecc410 ffff880068413ec0 ffff8800705a2330 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348299] Call Trace: Jul 30 23:35:59 iliosserv2 kernel: [ 600.348305] [] ? __gdev_sched_mem_thread+0x90/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348311] [] ? gdev_sched_mem_thread+0x90/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348315] [] schedule+0x3f/0x60 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348321] [] gdev_sched_com_thread+0x5e/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348325] [] kthread+0x93/0xa0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348329] [] kernel_thread_helper+0x4/0x10 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348334] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348338] [] ? gs_change+0xb/0xb Jul 30 23:35:59 iliosserv2 kernel: [ 600.348341] INFO: task gschedm2:2752 blocked for more than 120 seconds. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348346] gschedm2 D 0000000000000000 0 2752 2 0x00000000 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348351] ffff880067013ea0 0000000000000046 ffff88007d3d4410 ffffffffa032a930 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348357] ffff88007d3d4410 ffff880067013fd8 ffff880067013fd8 ffff880067013fd8 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348362] ffff88007b0d2d60 ffff88007d3d4410 ffff880067013ec0 ffff8800705a2330 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348367] Call Trace: Jul 30 23:35:59 iliosserv2 kernel: [ 600.348373] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348379] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348384] [] schedule+0x3f/0x60 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348390] [] gdev_sched_mem_thread+0x5e/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348394] [] kthread+0x93/0xa0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348398] [] kernel_thread_helper+0x4/0x10 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348402] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348407] [] ? gs_change+0xb/0xb Jul 30 23:35:59 iliosserv2 kernel: [ 600.348410] INFO: task gschedc3:2755 blocked for more than 120 seconds. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348412] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348415] gschedc3 D 0000000000000000 0 2755 2 0x00000000 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348420] ffff8800684e3ea0 0000000000000046 ffff88006ad0c410 ffffffffa032a9c0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348425] ffff88006ad0c410 ffff8800684e3fd8 ffff8800684e3fd8 ffff8800684e3fd8 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348430] ffff88007b0d2d60 ffff88006ad0c410 ffff8800684e3ec0 ffff8800705a24c8 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348436] Call Trace: Jul 30 23:35:59 iliosserv2 kernel: [ 600.348441] [] ? gdev_sched_mem_thread+0x90/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348447] [] ? __gdev_sched_mem_thread+0x90/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348452] [] schedule+0x3f/0x60 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348457] [] gdev_sched_com_thread+0x5e/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348461] [] kthread+0x93/0xa0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348466] [] kernel_thread_helper+0x4/0x10 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348470] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348474] [] ? gs_change+0xb/0xb Jul 30 23:35:59 iliosserv2 kernel: [ 600.348477] INFO: task gschedm3:2756 blocked for more than 120 seconds. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348480] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:35:59 iliosserv2 kernel: [ 600.348482] gschedm3 D 0000000000000000 0 2756 2 0x00000000 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348487] ffff88006700bea0 0000000000000046 ffff88006ad0dac0 ffffffffa032a930 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348493] ffff88006ad0dac0 ffff88006700bfd8 ffff88006700bfd8 ffff88006700bfd8 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348498] ffff88007b0d2d60 ffff88006ad0dac0 ffff88006700bec0 ffff8800705a24c8 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348504] Call Trace: Jul 30 23:35:59 iliosserv2 kernel: [ 600.348509] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348516] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348520] [] schedule+0x3f/0x60 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348526] [] gdev_sched_mem_thread+0x5e/0x90 [gdev] Jul 30 23:35:59 iliosserv2 kernel: [ 600.348530] [] kthread+0x93/0xa0 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348534] [] kernel_thread_helper+0x4/0x10 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348539] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:35:59 iliosserv2 kernel: [ 600.348543] [] ? gs_change+0xb/0xb Jul 30 23:37:59 iliosserv2 kernel: [ 720.348046] INFO: task gschedc0:2743 blocked for more than 120 seconds. Jul 30 23:37:59 iliosserv2 kernel: [ 720.348051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:37:59 iliosserv2 kernel: [ 720.348055] gschedc0 D 0000000000000000 0 2743 2 0x00000000 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348061] ffff880067005ea0 0000000000000046 ffff880067005eb0 ffffffffa0331258 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348067] ffff88006adb16b0 ffff880067005fd8 ffff880067005fd8 ffff880067005fd8 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348073] ffff88007c848000 ffff88006adb16b0 ffffc90000000000 ffff8800705a2000 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348078] Call Trace: Jul 30 23:37:59 iliosserv2 kernel: [ 720.348092] [] ? gdev_select_next_compute+0x198/0x4a0 [gdev] Jul 30 23:37:59 iliosserv2 kernel: [ 720.348099] [] ? __gdev_sched_mem_thread+0x90/0x90 [gdev] Jul 30 23:37:59 iliosserv2 kernel: [ 720.348106] [] schedule+0x3f/0x60 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348113] [] gdev_sched_com_thread+0x5e/0x90 [gdev] Jul 30 23:37:59 iliosserv2 kernel: [ 720.348118] [] kthread+0x93/0xa0 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348124] [] kernel_thread_helper+0x4/0x10 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348128] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348133] [] ? gs_change+0xb/0xb Jul 30 23:37:59 iliosserv2 kernel: [ 720.348136] INFO: task gschedm0:2744 blocked for more than 120 seconds. Jul 30 23:37:59 iliosserv2 kernel: [ 720.348139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:37:59 iliosserv2 kernel: [ 720.348142] gschedm0 D 0000000000000000 0 2744 2 0x00000000 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348147] ffff880067051ea0 0000000000000046 ffff88006adb4410 ffffffffa032a930 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348152] ffff88006adb4410 ffff880067051fd8 ffff880067051fd8 ffff880067051fd8 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348158] ffff88007b0d2d60 ffff88006adb4410 ffff880067051ec0 ffff8800705a2000 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348163] Call Trace: Jul 30 23:37:59 iliosserv2 kernel: [ 720.348169] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:37:59 iliosserv2 kernel: [ 720.348175] [] ? gdev_sched_create_scheduler+0x1a0/0x1a0 [gdev] Jul 30 23:37:59 iliosserv2 kernel: [ 720.348180] [] schedule+0x3f/0x60 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348185] [] gdev_sched_mem_thread+0x5e/0x90 [gdev] Jul 30 23:37:59 iliosserv2 kernel: [ 720.348189] [] kthread+0x93/0xa0 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348194] [] kernel_thread_helper+0x4/0x10 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348198] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348203] [] ? gs_change+0xb/0xb Jul 30 23:37:59 iliosserv2 kernel: [ 720.348206] INFO: task gschedc1:2747 blocked for more than 120 seconds. Jul 30 23:37:59 iliosserv2 kernel: [ 720.348209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 30 23:37:59 iliosserv2 kernel: [ 720.348211] gschedc1 D 0000000000000000 0 2747 2 0x00000000 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348216] ffff88006851dea0 0000000000000046 ffff88007bfe0000 ffffffffa032a9c0 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348222] ffff88007bfe0000 ffff88006851dfd8 ffff88006851dfd8 ffff88006851dfd8 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348227] ffff88007b0d2d60 ffff88007bfe0000 ffff88006851dec0 ffff8800705a2198 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348232] Call Trace: Jul 30 23:37:59 iliosserv2 kernel: [ 720.348238] [] ? gdev_sched_mem_thread+0x90/0x90 [gdev] Jul 30 23:37:59 iliosserv2 kernel: [ 720.348244] [] ? __gdev_sched_mem_thread+0x90/0x90 [gdev] Jul 30 23:37:59 iliosserv2 kernel: [ 720.348248] [] schedule+0x3f/0x60 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348254] [] __gdev_sched_com_thread+0x5e/0x90 [gdev] Jul 30 23:37:59 iliosserv2 kernel: [ 720.348258] [] kthread+0x93/0xa0 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348262] [] kernel_thread_helper+0x4/0x10 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348267] [] ? kthread_freezable_should_stop+0x70/0x70 Jul 30 23:37:59 iliosserv2 kernel: [ 720.348271] [] ? gs_change+0xb/0xb


What should i do to launch multiple benchmarks at same time within gdev? and to figure out how gdev schedules and isolates multiple contexts. or.. did i misunderstand the gdev paper?

My gpu device is Geforce GTX 480.

Any help would be really appreciated for me. Thank you!!

ilios86 commented 10 years ago

When i disabled scheduler by giving -DGDEV_SCHED_DISABLED flag while compiling gdev module, above errors just disappered and two simultaneously launched benchmarks safely halt at same time.

However, in that case, there will be no scheduler to control virtual gpu (in my opinion). How the two benchmarks run simultaneously? they just operate in sequential manner internally?