how to maximize the gpu usage while using xmrig-cuda?

seyeeet commented 2 years ago

so i am able to run my mining script and leverage the gpu. however, only 6399MB out of 12188MB is being used. How can I use more gpu memory and increase the hash rate to benefit from all the gpu capacity? here is my current cuda setting:

    "cuda": {
        "enabled": true,
        "loader": null,
        "nvml": true,
        "astrobwt": [
            {
                "index": 0,
                "threads": 32,
                "blocks": 37,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "cn": [
            {
                "index": 0,
                "threads": 62,
                "blocks": 90,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "cn-heavy": [
            {
                "index": 0,
                "threads": 30,
                "blocks": 90,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "cn-lite": [
            {
                "index": 0,
                "threads": 124,
                "blocks": 90,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "cn-pico": [
            {
                "index": 0,
                "threads": 128,
                "blocks": 90,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "cn/2": [
            {
                "index": 0,
                "threads": 62,
                "blocks": 90,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "cn/upx2": [
            {
                "index": 0,
                "threads": 128,
                "blocks": 90,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "kawpow": [
            {
                "index": 0,
                "threads": 256,
                "blocks": 61440,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "rx": [
            {
                "index": 0,
                "threads": 32,
                "blocks": 60,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "rx/arq": [
            {
                "index": 0,
                "threads": 32,
                "blocks": 60,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "rx/keva": [
            {
                "index": 0,
                "threads": 32,
                "blocks": 60,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "rx/wow": [
            {
                "index": 0,
                "threads": 32,
                "blocks": 60,
                "bfactor": 0,
                "bsleep": 0,
                "affinity": -1,
                "dataset_host": false
            }
        ],
        "cn-lite/0": false,
        "cn/0": false
    },

Spudz76 commented 2 years ago

Depends on algo.

There are no gains from using more VRAM if the GPU is already out of bandwidth, or compute units, or other design metrics which always run out before using all the VRAM. So whatever amount the autoconfig picked is correct and essentially fastest aside from clocking that might make some go faster (some algos like more GPU Core clock, some algos like more Memory clock, depends where the bottleneck is).

seyeeet commented 2 years ago

what do you mean by if the GPU is already out of bandwidth?

Spudz76 commented 2 years ago

Memory bandwidth generally runs out well before memory usage does. Especially on nvidia.

There are 12GB way out in a field but the road getting there is only two lanes. Using all 12GB would just cause a traffic jam of trucks, so then you use the amount of space that matches the amount of trucks that can fit back and forth on the road the most efficiently (not waiting for traffic). That works out to whatever it selected, just over 6GB.

Or it might run out of caches before it runs out of memory. Running outside the cache limits would slow down.

Or it might run out of actual GPU shaders. Oversubscribing tasks to the compute units just slows things down, too.

Your assumption that using more memory equals more hashrate is flawed.

xmrig / xmrig-cuda

how to maximize the gpu usage while using xmrig-cuda? #117