Feature request: Option for device enumeration based on OpenCL device order

UselessGuru commented 5 years ago

Currently nanominer sorts the devices by PCI busID. Unfortunately the PCI BusID information is not available through the OpenCL API which, as a consequence, breaks proper integration on MultiPoolMiner based software (e.g. MultipoolMiner, Nemos Miner etc.). Could you please add an option (e.g. --deviceOrder OpenCL) which will allow nanominer to sort the devices based on OpenCL device ID?

OpenCL device order in my case is:

GPU#00 GeForce GTX 1080 Ti
GPU#01 GeForce GTX 1060 6GB
GPU#02 Radeon RX 580 Series (Ellesmere)
GPU#03 AMD Radeon (TM) RX 560 (Baffin)

Nanominer now:

nanominer -d
2019-06-10 07:36:23: CUDA driver version is 10.2, runtime version is 10.0
Detected 4 devices:
GPU 0  PCI 01:00.0  11264 MB  GeForce GTX 1080 Ti
GPU 1  PCI 04:00.0  6144 MB  GeForce GTX 1060 6GB
GPU 2  PCI 05:00.0  2048 MB  AMD Radeon (TM) RX 560 (Baffin)
GPU 3  PCI 09:00.0  8192 MB  Radeon RX 580 Series (Ellesmere)

Preferred result:

nanominer --deviceOrder OpenCL -d
2019-06-10 07:36:23: CUDA driver version is 10.2, runtime version is 10.0
Detected 4 devices:
GPU 0  PCI 01:00.0  11264 MB  GeForce GTX 1080 Ti
GPU 1  PCI 04:00.0  6144 MB  GeForce GTX 1060 6GB
GPU 2  PCI 09:00.0  8192 MB  Radeon RX 580 Series (Ellesmere)
GPU 3  PCI 05:00.0  2048 MB  AMD Radeon (TM) RX 560 (Baffin)

Grumpy-Dwarf commented 5 years ago

There are many orders in different programs. Our competitor miners used AMD cards then Nvidia cards order. We used to have Nvidia (enumerated by CUDA) then AMD devices. Finally what we and competitors investigated is there is no good order other than PCI address for AMD and Nvidia GPUs.

First of all, we use OpenCL API for AMD and CUDA for Nvidia. So need to match OpenCL order with CUDA order of Nvidia devices which is tricky since all devices are similar usually. Need to get either PCI address either GPU UUID (don't know if it is possible in OpenCL actually) on both CUDA and OpenCL to match devices.
There is no standard way to get PCI bus address of device in OpenCL because it is generic API for all type of GPUs, not only connected by PCI. Both AMD and Nvidia have extensions however for getting bus address through OpenCL, see https://anteru.net/blog/2014/associating-opencl-device-ids-with-gpus/
Platforms order is tricky. It depends on what driver was installed last and how application accesses OpenCL. Using statically linked OpenCL-icd and system-wide OpenCL.dll/OpenCL.so can give different platform orders, first depends on what is inside registry and latter depends on what OpenCL.dll/OpenCL.so you have - is it Nvidia driver version, AMD driver version, Intel's, standard ICD or whatever.
It was a pain for me to know but there is no such thing as devices order inside single platform. There are OpenCL platforms actually (Hello, Intel!) that can give you platform devices in any order. Different orders in same program. First I thought it is bug I need to report but then I looked through standard and did not found anything about order. We can get number of devices, we can get list of devices of given length. Order is not required to be the same on each clGetDeviceInfo. So using OpenCL order after PCI order even as an option looks error-prone for me.

However I see there is a technical problem which makes proper integration impossible. Can I help implementing PCI order in MultiPoolMiner?

UselessGuru commented 5 years ago

@Grumpy-Dwarf Thank you for your information.

Can I help implementing PCI order in MultiPoolMiner?

Yes sure!

MultiPoolMiner relies entirely on the device order returned by OpenCL, In most cases that matches the PCI device order - but no always (kin my computer it does not)

If you know of a clever way to map the PCI id to the OpenCL id. That's all what's needed. Currently I have implemented an optional 'manual' mapping that re-maps the PCI id's to OpenCL id's:

  "DevicePciOrderMapping": {
    "GPU#00": "0",
    "GPU#01": "1",
    "GPU#02": "3",
    "GPU#03": "2"
  },

Or you know of a way to re-order the device order returned by OpenCL.

Platforms order is tricky.

MPM can deal with this. This is what MPM currently uses (mostly retrieved trough OpenCL, except for the PCIBus_* entries which I add by using the DevicePciOrderMapping information):

[
  {
    "Index": 0,
    "PlatformId": 0,
    "PlatformId_Index": 0,
    "Type_PlatformId_Index": 0,
    "Vendor": "NVIDIA Corporation",
    "Vendor_ShortName": "NVIDIA",
    "Vendor_Index": 0,
    "Type_Vendor_Index": 0,
    "Type": "Gpu",
    "Type_Index": 0,
    "OpenCL": {
      "AddressBits": 64,
      "Available": true,
      "CompilerAvailable": true,
      "DoubleFpConfig": "FpDenorm, FpInfNan, FpRoundToNearest, FpRoundToZero, FpRoundToInf, FpFma",
      "EndianLittle": true,
      "ErrorCorrectionSupport": false,
      "ExecCapabilities": "ExecKernel",
      "Extensions": "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer",
      "GlobalMemCacheSize": 458752,
      "GlobalMemCacheType": 2,
      "GlobalMemCachelineSize": 128,
      "GlobalMemSize": 11811160064,
      "ImageSupport": true,
      "LocalMemSize": 49152,
      "LocalMemType": 1,
      "MaxClockFrequency": 1607,
      "MaxComputeUnits": 28,
      "MaxConstantArgs": 9,
      "MaxConstantBufferSize": null,
      "MaxMemAllocSize": 2952790016,
      "MaxParameterSize": 4352,
      "MaxReadImageArgs": 256,
      "MaxSamplers": 32,
      "MaxWorkGroupSize": 1024,
      "MaxWorkItemDimensions": 3,
      "MaxWorkItemSizes": "1024 1024 64",
      "MaxWriteImageArgs": 16,
      "MemBaseAddrAlign": 4096,
      "MinDataTypeAlignSize": 128,
      "Name": "GeForce GTX 1080 Ti",
      "ClVersion": "OpenCL C 1.2 ",
      "Platform": "@{Profile=FULL_PROFILE; Version=OpenCL 1.2 CUDA 10.2.120; Name=NVIDIA CUDA; Vendor=NVIDIA Corporation; Extensions=System.Object[]}",
      "PreferredVectorWidthChar": 1,
      "PreferredVectorWidthShort": 1,
      "PreferredVectorWidthInt": 1,
      "PreferredVectorWidthLong": 1,
      "PreferredVectorWidthFloat": 1,
      "PreferredVectorWidthDouble": 1,
      "Profile": "FULL_PROFILE",
      "QueueProperties": "OutOfOrderExecModeEnable, ProfilingEnable",
      "SingleFpConfig": "FpDenorm, FpInfNan, FpRoundToNearest, FpRoundToZero, FpRoundToInf, FpFma, FpCorrectlyRoundedDivideSqrt",
      "Type": "Gpu",
      "Vendor": "NVIDIA Corporation",
      "VendorId": 4318,
      "Version": "OpenCL 1.2 CUDA",
      "DriverVersion": "430.86"
    },
    "Model": "GeForce GTX 1080 Ti",
    "Model_Norm": "GTX1080Ti",
    "PCIBus_Index": 0,
    "PCIBus_Type_Index": 0,
    "PCIBus_Type_PlatformId_Index": 0,
    "PCIBus_Type_Vendor_Index": 0,
    "PCIBus_Vendor_Index": 0,
    "Name": "GPU#00",
    "Status": "Running (NVIDIA-CryptoDredge_v0.20.1 [Argon2UIS])"
  },
  {
    "Index": 1,
    "PlatformId": 0,
    "PlatformId_Index": 1,
    "Type_PlatformId_Index": 1,
    "Vendor": "NVIDIA Corporation",
    "Vendor_ShortName": "NVIDIA",
    "Vendor_Index": 1,
    "Type_Vendor_Index": 1,
    "Type": "Gpu",
    "Type_Index": 1,
    "OpenCL": {
      "AddressBits": 64,
      "Available": true,
      "CompilerAvailable": true,
      "DoubleFpConfig": "FpDenorm, FpInfNan, FpRoundToNearest, FpRoundToZero, FpRoundToInf, FpFma",
      "EndianLittle": true,
      "ErrorCorrectionSupport": false,
      "ExecCapabilities": "ExecKernel",
      "Extensions": "cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer",
      "GlobalMemCacheSize": 163840,
      "GlobalMemCacheType": 2,
      "GlobalMemCachelineSize": 128,
      "GlobalMemSize": 6442450944,
      "ImageSupport": true,
      "LocalMemSize": 49152,
      "LocalMemType": 1,
      "MaxClockFrequency": 1784,
      "MaxComputeUnits": 10,
      "MaxConstantArgs": 9,
      "MaxConstantBufferSize": null,
      "MaxMemAllocSize": 1610612736,
      "MaxParameterSize": 4352,
      "MaxReadImageArgs": 256,
      "MaxSamplers": 32,
      "MaxWorkGroupSize": 1024,
      "MaxWorkItemDimensions": 3,
      "MaxWorkItemSizes": "1024 1024 64",
      "MaxWriteImageArgs": 16,
      "MemBaseAddrAlign": 4096,
      "MinDataTypeAlignSize": 128,
      "Name": "GeForce GTX 1060 6GB",
      "ClVersion": "OpenCL C 1.2 ",
      "Platform": "@{Profile=FULL_PROFILE; Version=OpenCL 1.2 CUDA 10.2.120; Name=NVIDIA CUDA; Vendor=NVIDIA Corporation; Extensions=System.Object[]}",
      "PreferredVectorWidthChar": 1,
      "PreferredVectorWidthShort": 1,
      "PreferredVectorWidthInt": 1,
      "PreferredVectorWidthLong": 1,
      "PreferredVectorWidthFloat": 1,
      "PreferredVectorWidthDouble": 1,
      "Profile": "FULL_PROFILE",
      "QueueProperties": "OutOfOrderExecModeEnable, ProfilingEnable",
      "SingleFpConfig": "FpDenorm, FpInfNan, FpRoundToNearest, FpRoundToZero, FpRoundToInf, FpFma, FpCorrectlyRoundedDivideSqrt",
      "Type": "Gpu",
      "Vendor": "NVIDIA Corporation",
      "VendorId": 4318,
      "Version": "OpenCL 1.2 CUDA",
      "DriverVersion": "430.86"
    },
    "Model": "GeForce GTX 1060 6GB",
    "Model_Norm": "GTX10606GB",
    "PCIBus_Index": 1,
    "PCIBus_Type_Index": 1,
    "PCIBus_Type_PlatformId_Index": 1,
    "PCIBus_Type_Vendor_Index": 1,
    "PCIBus_Vendor_Index": 1,
    "Name": "GPU#01",
    "Status": "Running (NVIDIA-CryptoDredge_v0.20.1 [Argon2UIS])"
  },
  {
    "Index": 2,
    "PlatformId": 1,
    "PlatformId_Index": 0,
    "Type_PlatformId_Index": 0,
    "Vendor": "Advanced Micro Devices, Inc.",
    "Vendor_ShortName": "AMD",
    "Vendor_Index": 0,
    "Type_Vendor_Index": 0,
    "Type": "Gpu",
    "Type_Index": 2,
    "OpenCL": {
      "AddressBits": 64,
      "Available": true,
      "CompilerAvailable": true,
      "DoubleFpConfig": "FpDenorm, FpInfNan, FpRoundToNearest, FpRoundToZero, FpRoundToInf, FpFma",
      "EndianLittle": true,
      "ErrorCorrectionSupport": false,
      "ExecCapabilities": "ExecKernel",
      "Extensions": "cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_planar_yuv",
      "GlobalMemCacheSize": 16384,
      "GlobalMemCacheType": 2,
      "GlobalMemCachelineSize": 64,
      "GlobalMemSize": 8589934592,
      "ImageSupport": true,
      "LocalMemSize": 32768,
      "LocalMemType": 1,
      "MaxClockFrequency": 1325,
      "MaxComputeUnits": 36,
      "MaxConstantArgs": 8,
      "MaxConstantBufferSize": null,
      "MaxMemAllocSize": 4244635648,
      "MaxParameterSize": 1024,
      "MaxReadImageArgs": 128,
      "MaxSamplers": 16,
      "MaxWorkGroupSize": 1024,
      "MaxWorkItemDimensions": 3,
      "MaxWorkItemSizes": "1024 1024 1024",
      "MaxWriteImageArgs": 64,
      "MemBaseAddrAlign": 2048,
      "MinDataTypeAlignSize": 128,
      "Name": "Ellesmere",
      "ClVersion": "OpenCL C 2.0 ",
      "Platform": "@{Profile=FULL_PROFILE; Version=OpenCL 2.1 AMD-APP (2841.19); Name=AMD Accelerated Parallel Processing; Vendor=Advanced Micro Devices, Inc.; Extensions=System.Object[]}",
      "PreferredVectorWidthChar": 4,
      "PreferredVectorWidthShort": 2,
      "PreferredVectorWidthInt": 1,
      "PreferredVectorWidthLong": 1,
      "PreferredVectorWidthFloat": 1,
      "PreferredVectorWidthDouble": 1,
      "Profile": "FULL_PROFILE",
      "QueueProperties": "ProfilingEnable",
      "SingleFpConfig": "FpInfNan, FpRoundToNearest, FpRoundToZero, FpRoundToInf, FpFma, FpCorrectlyRoundedDivideSqrt",
      "Type": "Gpu",
      "Vendor": "Advanced Micro Devices, Inc.",
      "VendorId": 4098,
      "Version": "OpenCL 2.0 AMD-APP (2841.19)",
      "DriverVersion": "2841.19"
    },
    "Model": "Ellesmere8GB",
    "Model_Norm": "Ellesmere8GB",
    "PCIBus_Index": 3,
    "PCIBus_Type_Index": 3,
    "PCIBus_Type_PlatformId_Index": 1,
    "PCIBus_Type_Vendor_Index": 1,
    "PCIBus_Vendor_Index": 1,
    "Name": "GPU#02",
    "Status": "Running (AMD_NVIDIA-ClaymoreEthash_v14.7 [Ethash])"
  },
  {
    "Index": 3,
    "PlatformId": 1,
    "PlatformId_Index": 1,
    "Type_PlatformId_Index": 1,
    "Vendor": "Advanced Micro Devices, Inc.",
    "Vendor_ShortName": "AMD",
    "Vendor_Index": 1,
    "Type_Vendor_Index": 1,
    "Type": "Gpu",
    "Type_Index": 3,
    "OpenCL": {
      "AddressBits": 64,
      "Available": true,
      "CompilerAvailable": true,
      "DoubleFpConfig": "FpDenorm, FpInfNan, FpRoundToNearest, FpRoundToZero, FpRoundToInf, FpFma",
      "EndianLittle": true,
      "ErrorCorrectionSupport": false,
      "ExecCapabilities": "ExecKernel",
      "Extensions": "cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_planar_yuv",
      "GlobalMemCacheSize": 16384,
      "GlobalMemCacheType": 2,
      "GlobalMemCachelineSize": 64,
      "GlobalMemSize": 2147483648,
      "ImageSupport": true,
      "LocalMemSize": 32768,
      "LocalMemType": 1,
      "MaxClockFrequency": 1300,
      "MaxComputeUnits": 16,
      "MaxConstantArgs": 8,
      "MaxConstantBufferSize": null,
      "MaxMemAllocSize": 1879048192,
      "MaxParameterSize": 1024,
      "MaxReadImageArgs": 128,
      "MaxSamplers": 16,
      "MaxWorkGroupSize": 1024,
      "MaxWorkItemDimensions": 3,
      "MaxWorkItemSizes": "1024 1024 1024",
      "MaxWriteImageArgs": 64,
      "MemBaseAddrAlign": 2048,
      "MinDataTypeAlignSize": 128,
      "Name": "Baffin",
      "ClVersion": "OpenCL C 2.0 ",
      "Platform": "@{Profile=FULL_PROFILE; Version=OpenCL 2.1 AMD-APP (2841.19); Name=AMD Accelerated Parallel Processing; Vendor=Advanced Micro Devices, Inc.; Extensions=System.Object[]}",
      "PreferredVectorWidthChar": 4,
      "PreferredVectorWidthShort": 2,
      "PreferredVectorWidthInt": 1,
      "PreferredVectorWidthLong": 1,
      "PreferredVectorWidthFloat": 1,
      "PreferredVectorWidthDouble": 1,
      "Profile": "FULL_PROFILE",
      "QueueProperties": "ProfilingEnable",
      "SingleFpConfig": "FpInfNan, FpRoundToNearest, FpRoundToZero, FpRoundToInf, FpFma, FpCorrectlyRoundedDivideSqrt",
      "Type": "Gpu",
      "Vendor": "Advanced Micro Devices, Inc.",
      "VendorId": 4098,
      "Version": "OpenCL 2.0 AMD-APP (2841.19)",
      "DriverVersion": "2841.19"
    },
    "Model": "Baffin2GB",
    "Model_Norm": "Baffin2GB",
    "PCIBus_Index": 2,
    "PCIBus_Type_Index": 2,
    "PCIBus_Type_PlatformId_Index": 0,
    "PCIBus_Type_Vendor_Index": 0,
    "PCIBus_Vendor_Index": 0,
    "Name": "GPU#03",
    "Status": "Disabled"
  },
  {
    "Index": 4,
    "Vendor": "GenuineIntel",
    "Vendor_ShortName": "INTEL",
    "Type_Vendor_Index": 0,
    "Type": "Cpu",
    "Type_Index": 0,
    "CIM": {
      "CimClass": "@{CimSuperClassName=CIM_Processor; CimSuperClass=; CimClassProperties=System.Object[]; CimClassQualifiers=System.Object[]; CimClassMethods=System.Object[]; CimSystemProperties=}",
      "CimInstanceProperties": "                                                        ",
      "CimSystemProperties": "@{Namespace=root/cimv2; ServerName=BLACKBOX; ClassName=Win32_Processor; Path=}",
      "Caption": "Intel64 Family 6 Model 158 Stepping 10",
      "Description": "Intel64 Family 6 Model 158 Stepping 10",
      "InstallDate": null,
      "Name": "Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz",
      "Status": "OK",
      "Availability": 3,
      "ConfigManagerErrorCode": null,
      "ConfigManagerUserConfig": null,
      "CreationClassName": "Win32_Processor",
      "DeviceID": "CPU0",
      "ErrorCleared": null,
      "ErrorDescription": null,
      "LastErrorCode": null,
      "PNPDeviceID": null,
      "PowerManagementCapabilities": null,
      "PowerManagementSupported": false,
      "StatusInfo": 3,
      "SystemCreationClassName": "Win32_ComputerSystem",
      "SystemName": "BLACKBOX",
      "AddressWidth": 64,
      "CurrentClockSpeed": 3600,
      "DataWidth": 64,
      "Family": 205,
      "LoadPercentage": null,
      "MaxClockSpeed": 3600,
      "OtherFamilyDescription": null,
      "Role": "CPU",
      "Stepping": null,
      "UniqueId": null,
      "UpgradeMethod": 50,
      "Architecture": 9,
      "AssetTag": "To Be Filled By O.E.M.",
      "Characteristics": 236,
      "CpuStatus": 1,
      "CurrentVoltage": 12,
      "ExtClock": 100,
      "L2CacheSize": 1536,
      "L2CacheSpeed": null,
      "L3CacheSize": 9216,
      "L3CacheSpeed": 0,
      "Level": 6,
      "Manufacturer": "GenuineIntel",
      "NumberOfCores": 6,
      "NumberOfEnabledCore": 6,
      "NumberOfLogicalProcessors": 6,
      "PartNumber": "To Be Filled By O.E.M.",
      "ProcessorId": "BFEBFBFF000906EA",
      "ProcessorType": 3,
      "Revision": null,
      "SecondLevelAddressTranslationExtensions": false,
      "SerialNumber": "To Be Filled By O.E.M.",
      "SocketDesignation": "U3E1",
      "ThreadCount": 6,
      "Version": "",
      "VirtualizationFirmwareEnabled": false,
      "VMMonitorModeExtensions": false,
      "VoltageCaps": null,
      "PSComputerName": null
    },
    "Model": "Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz",
    "Model_Norm": "GenuineIntel6CoreCPU",
    "CpuFeatures": [
      "ABM",
      "ADX",
      "AES",
      "AVX",
      "AVX2",
      "BMI1",
      "BMI2",
      "FMA3",
      "MMX",
      "MPX",
      "RDRAND",
      "SSE",
      "SSE2",
      "SSE3",
      "SSE41",
      "SSE42",
      "SSSE3",
      "x64"
    ],
    "Name": "CPU#00",
    "Status": "Disabled"
  }
]

nanopool / nanominer

Feature request: Option for device enumeration based on OpenCL device order #30