sgminer-dev / sgminer

Scrypt GPU miner
GNU General Public License v3.0
629 stars 825 forks source link

GPU selection with "-d" broken #129

Open mphardy opened 10 years ago

mphardy commented 10 years ago

I have six GPUs on a single machine and wanted to run one instance of sgminer for gpus 0-3 and one for gpus 4 and 5 so that I could mess with configurations without impacting all 6. Using -d 0,1,2,3 to start the first sgminer instance resulted in sgminer still running with gpus 5,6 but using bad values for the config resulting in them showing up but being in the OFF state:

This is with the nfactor branch

sgminer --nfactor=11 -d 0,1,2,3 -c sgminer.conf
[13:37:21] Maximum buffer memory device 4 supports says 2936012800
[13:37:21] Your scrypt settings come to -1083703296
[13:37:21] Maximum buffer memory device 5 supports says 2936012800
[13:37:21] Your scrypt settings come to -1083703296
sgminer 4.1.0-125-g11cf-dirty - Started: [2014-03-02 13:37:22]
--------------------------------------------------------------------------------
(5s):1.529M (avg):1.565Mh/s | A:734  R:140  HW:0  WU:1382.8/m
ST: 1  SS: 0  NB: 1  LW: 104  GF: 0  RF: 0
Connected to Pool 0 (stratum) diff 27 as user xxx
Block: f5590c05...  Diff:10.1M  Started: [13:37:21]  Best share: 7.77K
--------------------------------------------------------------------------------
[P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit
 GPU 0:  84.0C 3063RPM | 366.0K/378.4Kh/s | R: 13.7% HW:0 WU: 356.9/m I:19
 GPU 1:  81.0C 2847RPM | 382.7K/430.6Kh/s | R: 14.9% HW:0 WU: 397.2/m I:19
 GPU 2:  77.0C 3085RPM | 421.5K/469.8Kh/s | R:  0.0% HW:0 WU: 161.3/m I:19
 GPU 3:  80.0C 2987RPM | 421.5K/469.8Kh/s | R: 24.9% HW:0 WU: 467.4/m I:19
 GPU 4:  40.0C  55%    | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:19
 GPU 5:  44.0C  55%    | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:19
--------------------------------------------------------------------------------

( I have no fans on 4 and 5 due to oil immersion... that is why there are no RPM readings for them )

mphardy commented 10 years ago

Update: I had removed the parameters for cards 4 and 5 from the config... leaving the six tc, engine and mem parameters in avoids the startup error, but the two cards still show up as OFF and trying to start another sgminer for just cards 4 and 5 (once the first instance is up) results in more errors and a segfault.

mphardy commented 10 years ago

More testing ... using the same sgminer.conf for all three instances but passing -d 0,1 to the first, -d 2,3 to the second and -d 4,5 to the third. This is the same sgminer.conf that works with all six cards on the same sgminer (no -d arguement).

This first instance (-d 0,1) starts up but seems to be accessing fan RPMs and temperatures for cards it isn't supposed to be using:

--------------------------------------------------------------------------------
[P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit
 GPU 0:  83.0C 4173RPM | 420.5K/424.4Kh/s | R:  3.6% HW:0 WU: 392.4/m I:19
 GPU 1:  83.0C 4155RPM | 418.6K/422.1Kh/s | R:  8.6% HW:0 WU: 353.5/m I:19
 GPU 2:  52.0C 1167RPM | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:19
 GPU 3:  52.0C 1144RPM | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:19
 GPU 4:  43.0C  30%    | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:18
 GPU 5:  46.0C  31%    | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:18
--------------------------------------------------------------------------------

The second instance (-d 2,3) complains about bad configuration parameters but actually starts up however the RPM for the cards "owned" by the first instance still show up and it seems like the second instance is fighting the first on fan settings causing the fan RPMs to fluctuate high-low-high-low-etc.

[15:23:19] Maximum buffer memory device 0 supports says 787480576
[15:23:19] Your scrypt settings come to -1083703296
[15:23:19] Maximum buffer memory device 1 supports says 787480576
[15:23:19] Your scrypt settings come to -1083703296

The third instance (-d 4,5) also starts up but again with complaints of bad configuration for device 0 and 1:

sgminer 4.1.0-125-g11cf-dirty - Started: [2014-03-02 15:24:06]
--------------------------------------------------------------------------------
(5s):313.5K (avg):447.7Kh/s | A:86  R:49  HW:0  WU:614.8/m
ST: 1  SS: 0  NB: 2  LW: 35  GF: 0  RF: 0
Connected to Pool 0 (stratum) diff 29 as user xxx
Block: b8cc7153...  Diff:10.1M  Started: [15:24:09]  Best share: 3.42K
--------------------------------------------------------------------------------
[P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit
 GPU 0:  84.0C 3799RPM | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:19
 GPU 1:  83.0C 2853RPM | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:19
 GPU 2:  54.0C 3190RPM | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:19
 GPU 3:  54.0C 3092RPM | OFF   / 0.000h/s | R:  0.0% HW:0 WU:   0.0/m I:19
 GPU 4:  57.0C  55%    | 124.0K/313.4Kh/s | R: 46.0% HW:0 WU: 532.8/m I:18
 GPU 5:  58.0C  55%    | 124.1K/313.4Kh/s | R: 22.1% HW:0 WU: 415.0/m I:18
--------------------------------------------------------------------------------
[15:24:01] Started sgminer 4.1.0-125-g11cf-dirty
[15:24:01] Loaded configuration file /home/mphardy/sgminer.conf
[15:24:05] Kernel zuikkis is experimental.
[15:24:05] Maximum buffer memory device 0 supports says 787480576
[15:24:05] Your scrypt settings come to -1083703296
[15:24:05] Kernel zuikkis is experimental.
[15:24:05] Maximum buffer memory device 1 supports says 787480576
[15:24:05] Your scrypt settings come to -1083703296
[15:24:05] Kernel zuikkis is experimental.
[15:24:05] Kernel zuikkis is experimental.
[15:24:05] Kernel zuikkis is experimental.
[15:24:05] Kernel zuikkis is experimental.

So something odd is certainly happening and some questions are raised:

veox commented 10 years ago

Semi-OT note: to work around editing your user name out, you could try --incognito on CLI (or [D][I] in ncurses UI). This should work on latest git master (and I hope https://github.com/veox/sgminer/issues/123 is fixed).

EDIT: Somehow missed this part:

This is with the nfactor branch

So the above proposal will only work if nfactor is merged into master (or the other way around; locally is enough, of course).