kvark / blade

Sharp and simple graphics library
MIT License
545 stars 37 forks source link

Don't block use of Intel on Nvidia hybrid systems. #93

Closed flukejones closed 8 months ago

flukejones commented 8 months ago

The recent commit https://github.com/kvark/blade/commit/30c4fa41f1476e932b9c1f104665b0a2904f94c4 makes it so that things like zed run on the dgpu full time, this is not an acceptable solution for https://github.com/kvark/blade/issues/88 as it causes excessive battery drain, heat, etc.

In that issue I reference https://github.com/gfx-rs/wgpu/pull/4110 because it appears to be a very similar use case.

My own system is currently:

Operating System: Fedora Linux 40 Kernel Version: 6.8.0-rc7+ (64-bit) Graphics Platform: Wayland Processors: 32 × Intel® Core™ i9-14900HX Memory: 62.4 GiB of RAM Graphics Processor: Mesa Intel® Graphics Manufacturer: ASUSTeK COMPUTER INC. Product Name: ROG Strix SCAR 16 G634JYR_G634JYR_000045397 System Version: 1.0

Installed mesa version is: 24.0.0

Installed nvidia version: 550.54.14

I also have older laptops I can test. Plus I have tested on fedora 39 quite fine which used 6.6.x and 6.7.x kernels. The desktop is irrelevant here, I've tested on COSMIC, Gnome, KDE. What is of note however is that I do not use Xorg sessions and haven't done for years.

The proper and expected solution is to find the exact cause of https://github.com/kvark/blade/issues/88 and either fix that, or work around that one specific case. A blanket blocking of all intel/nvidia just handicaps everyone regardless.

Logs

Output from example using Intel:

The VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/intel_icd.x86_64.json wasn't actually required here.

❯ VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/intel_icd.x86_64.json RUST_LOG=debug cargo run --example particle
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/examples/particle`
[2024-03-11T04:35:49Z DEBUG egui_winit::clipboard] Initializing arboard clipboard…
[2024-03-11T04:35:49Z DEBUG egui_winit::clipboard] Initializing smithay clipboard…
[2024-03-11T04:35:49Z WARN  blade_graphics::hal::init] Requested layer is not found: "VK_LAYER_KHRONOS_validation"
[2024-03-11T04:35:49Z INFO  blade_graphics::hal::init] Adapter "Intel(R) Graphics (RPL-S)"
[2024-03-11T04:35:49Z INFO  blade_graphics::hal::init] No ray tracing extensions are supported
[2024-03-11T04:35:49Z DEBUG blade_graphics::hal::init] Adapter AdapterCapabilities {
        api_version: 4206866,
        properties: PhysicalDeviceProperties {
            api_version: 4206866,
            driver_version: 100663296,
            vendor_id: 32902,
            device_id: 42888,
            device_type: INTEGRATED_GPU,
            device_name: "Intel(R) Graphics (RPL-S)",
            pipeline_cache_uuid: [
                30,
                116,
                185,
                68,
                191,
                206,
                235,
                242,
                182,
                25,
                50,
                41,
                188,
                127,
                83,
                255,
            ],
            limits: PhysicalDeviceLimits {
                max_image_dimension1_d: 16384,
                max_image_dimension2_d: 16384,
                max_image_dimension3_d: 2048,
                max_image_dimension_cube: 16384,
                max_image_array_layers: 2048,
                max_texel_buffer_elements: 134217728,
                max_uniform_buffer_range: 1073741824,
                max_storage_buffer_range: 4294967295,
                max_push_constants_size: 128,
                max_memory_allocation_count: 4294967295,
                max_sampler_allocation_count: 65536,
                buffer_image_granularity: 1,
                sparse_address_space_size: 17587891077120,
                max_bound_descriptor_sets: 8,
                max_per_stage_descriptor_samplers: 65535,
                max_per_stage_descriptor_uniform_buffers: 64,
                max_per_stage_descriptor_storage_buffers: 65535,
                max_per_stage_descriptor_sampled_images: 65535,
                max_per_stage_descriptor_storage_images: 65535,
                max_per_stage_descriptor_input_attachments: 64,
                max_per_stage_resources: 4294967295,
                max_descriptor_set_samplers: 393210,
                max_descriptor_set_uniform_buffers: 384,
                max_descriptor_set_uniform_buffers_dynamic: 8,
                max_descriptor_set_storage_buffers: 393210,
                max_descriptor_set_storage_buffers_dynamic: 8,
                max_descriptor_set_sampled_images: 393210,
                max_descriptor_set_storage_images: 393210,
                max_descriptor_set_input_attachments: 256,
                max_vertex_input_attributes: 29,
                max_vertex_input_bindings: 31,
                max_vertex_input_attribute_offset: 2047,
                max_vertex_input_binding_stride: 4095,
                max_vertex_output_components: 128,
                max_tessellation_generation_level: 64,
                max_tessellation_patch_size: 32,
                max_tessellation_control_per_vertex_input_components: 128,
                max_tessellation_control_per_vertex_output_components: 128,
                max_tessellation_control_per_patch_output_components: 128,
                max_tessellation_control_total_output_components: 2048,
                max_tessellation_evaluation_input_components: 128,
                max_tessellation_evaluation_output_components: 128,
                max_geometry_shader_invocations: 32,
                max_geometry_input_components: 128,
                max_geometry_output_components: 128,
                max_geometry_output_vertices: 256,
                max_geometry_total_output_components: 1024,
                max_fragment_input_components: 116,
                max_fragment_output_attachments: 8,
                max_fragment_dual_src_attachments: 1,
                max_fragment_combined_output_resources: 131078,
                max_compute_shared_memory_size: 65536,
                max_compute_work_group_count: [
                    65535,
                    65535,
                    65535,
                ],
                max_compute_work_group_invocations: 1024,
                max_compute_work_group_size: [
                    1024,
                    1024,
                    1024,
                ],
                sub_pixel_precision_bits: 8,
                sub_texel_precision_bits: 8,
                mipmap_precision_bits: 8,
                max_draw_indexed_index_value: 4294967295,
                max_draw_indirect_count: 4294967295,
                max_sampler_lod_bias: 16.0,
                max_sampler_anisotropy: 16.0,
                max_viewports: 16,
                max_viewport_dimensions: [
                    16384,
                    16384,
                ],
                viewport_bounds_range: [
                    -32768.0,
                    32767.0,
                ],
                viewport_sub_pixel_bits: 13,
                min_memory_map_alignment: 4096,
                min_texel_buffer_offset_alignment: 16,
                min_uniform_buffer_offset_alignment: 64,
                min_storage_buffer_offset_alignment: 4,
                min_texel_offset: -8,
                max_texel_offset: 7,
                min_texel_gather_offset: -32,
                max_texel_gather_offset: 31,
                min_interpolation_offset: -0.5,
                max_interpolation_offset: 0.4375,
                sub_pixel_interpolation_offset_bits: 4,
                max_framebuffer_width: 16384,
                max_framebuffer_height: 16384,
                max_framebuffer_layers: 2048,
                framebuffer_color_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                framebuffer_depth_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                framebuffer_stencil_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                framebuffer_no_attachments_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                max_color_attachments: 8,
                sampled_image_color_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                sampled_image_integer_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                sampled_image_depth_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                sampled_image_stencil_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                storage_image_sample_counts: TYPE_1,
                max_sample_mask_words: 1,
                timestamp_compute_and_graphics: 1,
                timestamp_period: 52.083332,
                max_clip_distances: 8,
                max_cull_distances: 8,
                max_combined_clip_and_cull_distances: 8,
                discrete_queue_priorities: 2,
                point_size_range: [
                    0.125,
                    255.875,
                ],
                line_width_range: [
                    0.0,
                    8.0,
                ],
                point_size_granularity: 0.125,
                line_width_granularity: 0.0078125,
                strict_lines: 0,
                standard_sample_locations: 1,
                optimal_buffer_copy_offset_alignment: 128,
                optimal_buffer_copy_row_pitch_alignment: 128,
                non_coherent_atom_size: 64,
            },
            sparse_properties: PhysicalDeviceSparseProperties {
                residency_standard2_d_block_shape: 1,
                residency_standard2_d_multisample_block_shape: 0,
                residency_standard3_d_block_shape: 1,
                residency_aligned_mip_size: 0,
                residency_non_resident_strict: 1,
            },
        },
        queue_family_index: 0,
        layered: false,
        ray_tracing: false,
        buffer_marker: true,
        shader_info: false,
    }

Using nvidia

❯ RUST_LOG=debug switcherooctl launch cargo run --example particle
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/examples/particle`
[2024-03-11T04:37:58Z DEBUG egui_winit::clipboard] Initializing arboard clipboard…
[2024-03-11T04:37:58Z DEBUG egui_winit::clipboard] Initializing smithay clipboard…
[2024-03-11T04:37:58Z WARN  blade_graphics::hal::init] Requested layer is not found: "VK_LAYER_KHRONOS_validation"
DRM kernel driver 'nvidia-drm' in use. NVK requires nouveau.
TU: error: ../src/freedreno/vulkan/tu_knl.cc:232: device /dev/dri/renderD128 (i915) is not compatible with turnip (VK_ERROR_INCOMPATIBLE_DRIVER)
TU: error: ../src/freedreno/vulkan/tu_knl.cc:232: device /dev/dri/renderD129 (nvidia-drm) is not compatible with turnip (VK_ERROR_INCOMPATIBLE_DRIVER)
[2024-03-11T04:37:58Z INFO  blade_graphics::hal::init] Adapter "NVIDIA GeForce RTX 4090 Laptop GPU"
[2024-03-11T04:37:58Z INFO  blade_graphics::hal::init] Ray tracing is supported
[2024-03-11T04:37:58Z DEBUG blade_graphics::hal::init] Ray tracing properties: PhysicalDeviceAccelerationStructurePropertiesKHR {
        s_type: PHYSICAL_DEVICE_ACCELERATION_STRUCTURE_PROPERTIES_KHR,
        p_next: 0x00007ffc7dbb8d40,
        max_geometry_count: 16777215,
        max_instance_count: 16777215,
        max_primitive_count: 536870911,
        max_per_stage_descriptor_acceleration_structures: 1048576,
        max_per_stage_descriptor_update_after_bind_acceleration_structures: 1048576,
        max_descriptor_set_acceleration_structures: 1048576,
        max_descriptor_set_update_after_bind_acceleration_structures: 1048576,
        min_acceleration_structure_scratch_offset_alignment: 128,
    }
[2024-03-11T04:37:58Z DEBUG blade_graphics::hal::init] Adapter AdapterCapabilities {
        api_version: 4206867,
        properties: PhysicalDeviceProperties {
            api_version: 4206869,
            driver_version: 2307752832,
            vendor_id: 4318,
            device_id: 10071,
            device_type: DISCRETE_GPU,
            device_name: "NVIDIA GeForce RTX 4090 Laptop GPU",
            pipeline_cache_uuid: [
                106,
                157,
                243,
                178,
                252,
                57,
                140,
                51,
                193,
                158,
                239,
                82,
                244,
                219,
                236,
                237,
            ],
            limits: PhysicalDeviceLimits {
                max_image_dimension1_d: 32768,
                max_image_dimension2_d: 32768,
                max_image_dimension3_d: 16384,
                max_image_dimension_cube: 32768,
                max_image_array_layers: 2048,
                max_texel_buffer_elements: 134217728,
                max_uniform_buffer_range: 65536,
                max_storage_buffer_range: 4294967295,
                max_push_constants_size: 256,
                max_memory_allocation_count: 4294967295,
                max_sampler_allocation_count: 4000,
                buffer_image_granularity: 1024,
                sparse_address_space_size: 1099511627775,
                max_bound_descriptor_sets: 32,
                max_per_stage_descriptor_samplers: 1048576,
                max_per_stage_descriptor_uniform_buffers: 1048576,
                max_per_stage_descriptor_storage_buffers: 1048576,
                max_per_stage_descriptor_sampled_images: 1048576,
                max_per_stage_descriptor_storage_images: 1048576,
                max_per_stage_descriptor_input_attachments: 1048576,
                max_per_stage_resources: 4294967295,
                max_descriptor_set_samplers: 1048576,
                max_descriptor_set_uniform_buffers: 1048576,
                max_descriptor_set_uniform_buffers_dynamic: 15,
                max_descriptor_set_storage_buffers: 1048576,
                max_descriptor_set_storage_buffers_dynamic: 16,
                max_descriptor_set_sampled_images: 1048576,
                max_descriptor_set_storage_images: 1048576,
                max_descriptor_set_input_attachments: 1048576,
                max_vertex_input_attributes: 32,
                max_vertex_input_bindings: 32,
                max_vertex_input_attribute_offset: 2047,
                max_vertex_input_binding_stride: 2048,
                max_vertex_output_components: 128,
                max_tessellation_generation_level: 64,
                max_tessellation_patch_size: 32,
                max_tessellation_control_per_vertex_input_components: 128,
                max_tessellation_control_per_vertex_output_components: 128,
                max_tessellation_control_per_patch_output_components: 120,
                max_tessellation_control_total_output_components: 4216,
                max_tessellation_evaluation_input_components: 128,
                max_tessellation_evaluation_output_components: 128,
                max_geometry_shader_invocations: 32,
                max_geometry_input_components: 128,
                max_geometry_output_components: 128,
                max_geometry_output_vertices: 1024,
                max_geometry_total_output_components: 1024,
                max_fragment_input_components: 128,
                max_fragment_output_attachments: 8,
                max_fragment_dual_src_attachments: 1,
                max_fragment_combined_output_resources: 4294967295,
                max_compute_shared_memory_size: 49152,
                max_compute_work_group_count: [
                    2147483647,
                    65535,
                    65535,
                ],
                max_compute_work_group_invocations: 1024,
                max_compute_work_group_size: [
                    1024,
                    1024,
                    64,
                ],
                sub_pixel_precision_bits: 8,
                sub_texel_precision_bits: 8,
                mipmap_precision_bits: 8,
                max_draw_indexed_index_value: 4294967295,
                max_draw_indirect_count: 4294967295,
                max_sampler_lod_bias: 15.0,
                max_sampler_anisotropy: 16.0,
                max_viewports: 16,
                max_viewport_dimensions: [
                    32768,
                    32768,
                ],
                viewport_bounds_range: [
                    -65536.0,
                    65536.0,
                ],
                viewport_sub_pixel_bits: 8,
                min_memory_map_alignment: 64,
                min_texel_buffer_offset_alignment: 16,
                min_uniform_buffer_offset_alignment: 64,
                min_storage_buffer_offset_alignment: 16,
                min_texel_offset: -8,
                max_texel_offset: 7,
                min_texel_gather_offset: -32,
                max_texel_gather_offset: 31,
                min_interpolation_offset: -0.5,
                max_interpolation_offset: 0.4375,
                sub_pixel_interpolation_offset_bits: 4,
                max_framebuffer_width: 32768,
                max_framebuffer_height: 32768,
                max_framebuffer_layers: 2048,
                framebuffer_color_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8,
                framebuffer_depth_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8,
                framebuffer_stencil_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                framebuffer_no_attachments_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                max_color_attachments: 8,
                sampled_image_color_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8,
                sampled_image_integer_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8,
                sampled_image_depth_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8,
                sampled_image_stencil_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                storage_image_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8,
                max_sample_mask_words: 1,
                timestamp_compute_and_graphics: 1,
                timestamp_period: 1.0,
                max_clip_distances: 8,
                max_cull_distances: 8,
                max_combined_clip_and_cull_distances: 8,
                discrete_queue_priorities: 2,
                point_size_range: [
                    1.0,
                    2047.9375,
                ],
                line_width_range: [
                    1.0,
                    64.0,
                ],
                point_size_granularity: 0.0625,
                line_width_granularity: 0.0625,
                strict_lines: 1,
                standard_sample_locations: 1,
                optimal_buffer_copy_offset_alignment: 1,
                optimal_buffer_copy_row_pitch_alignment: 1,
                non_coherent_atom_size: 64,
            },
            sparse_properties: PhysicalDeviceSparseProperties {
                residency_standard2_d_block_shape: 1,
                residency_standard2_d_multisample_block_shape: 1,
                residency_standard3_d_block_shape: 1,
                residency_aligned_mip_size: 0,
                residency_non_resident_strict: 1,
            },
        },
        queue_family_index: 0,
        layered: false,
        ray_tracing: true,
        buffer_marker: true,
        shader_info: false,
   } 

in all cases the example presented fine.

flukejones commented 8 months ago

As noted on the linked issue it looks like the root cause of the other users problems are the fact they are using xorg with a config to make xorg run on the nvidia dgpu by default. This is a very unique special case that will become phased out very soon. I'm surprised it isn;t already but then I guess "Ubuntu".

My own gpu management tool removed that hack a long time ago.

kvark commented 8 months ago

I don't have a system at hand that would be subject to this problem, and so it's very hard to investigate and find a minimal workaround. Any ideas on how exactly to detect the affected platform are appreciated!

flukejones commented 8 months ago

I would prefer the blocking code be removed because it was added for an edge case that is rarely used and will be even more uncommon with the coming distro releases.

We could perhaps restrict it to xorg only at least. That would be an env check at minimum.

flukejones commented 8 months ago

It would be safe to check these two env:

echo $XDG_SESSION_TYPE
x11

echo $XDG_SESSION_TYPE
wayland

echo $DESKTOP_SESSION
gnome-xorg

echo $DESKTOP_SESSION
gnome-wayland

I think if xorg-nvidia is used then glxinfo -B | grep Device will return with the Nvidia card name. So that could be another secondary check to prevent blocking folks using xorg as normal.