verement / cellminer

Bitcoin miner for the Cell Broadband Engine Architecture
GNU General Public License v2.0
79 stars 18 forks source link

Tuning tips needed #10

Closed devurandom closed 6 years ago

devurandom commented 11 years ago

With 18 used SPEs (automatically choosen by cellminer), cellminer only generates about 6Mh/s on my machine:

# cat /proc/cpuinfo 
processor       : 0
cpu             : Cell Broadband Engine, altivec supported
clock           : 3200.000000MHz
revision        : 48.0 (pvr 0070 3000)

processor       : 1
cpu             : Cell Broadband Engine, altivec supported
clock           : 3200.000000MHz
revision        : 48.0 (pvr 0070 3000)

processor       : 2
cpu             : Cell Broadband Engine, altivec supported
clock           : 3200.000000MHz
revision        : 48.0 (pvr 0070 3000)

processor       : 3
cpu             : Cell Broadband Engine, altivec supported
clock           : 3200.000000MHz
revision        : 48.0 (pvr 0070 3000)

timebase        : 26664332
platform        : Cell
model           : IBM,0793-38G
machine         : CHRP IBM,0793-38G

I remember you writing in a forum post that you are generating about 20Mh/s on a PS3, which afaik has only 1 CPU with 6 SPEs, while our cluster should have 8 per CPU. So I would expect at least 20Mh/s and 6Mh/s seems way too low. I also tried using 16, 24 and 32 SPE threads, but the figures did not change. 1 SPE seems to generate about 0.4Mh/s and this scales linearly up to 16 SPEs.

The compiler is: spu-elf-gcc (Gentoo 4.7.2-r1 p1.5, pie-0.5.5) 4.7.2 And CFLAGS were the defaults that you ship in the makefiles.

verement commented 11 years ago

That does seem slow. Could there be other processes using the SPEs? You could look with spu-ps or spu-top, but you might need to be root to see SPE threads other than your own. The output of spu-top would be interesting in any case.

devurandom commented 11 years ago

No, I am the only user on that machine.

# spu-top
spu-top: Context View                                                                                                                                                                                                                     [0/0]
Cpu(s) load avg:   2.0%,  2.1%,  2.0%
Spu(s) load avg:  17.9%, 17.9%, 17.9%
Cpu(s): 46.6%us,  3.4%sys,  0.0%wait,  0.0%nice, 50.0%idle
Spu(s): 99.4%us,  0.6%sys,  0.0%wait,  0.0%idle

   PID   TID USERNAME   S F  %SPU SPE     TIME BINARY
# spu-ps
   PID   TID USERNAME   S F SPE     TIME BINARY            

So both show no processes on the SPE...

I now also passed --ppe 2 to cellminer (which defaults to 0 otherwise) and now I get 34Mh/s (with --spe 18). Since you got 22Mh/s on the PS3 with 1 CPU, 2 cores, 6 SPEs, I would assume to get at least 44Mh/s... And: Given that I have 2 processors with 2 cores each, --ppe 4 should bring an improvement, but in reality it does not. htop shows why: 2 of the 4 cores are always unused at 0% utilisation - I have no idea why...

verement commented 11 years ago

Yeah, something’s not adding up. I don’t know why you’re not seeing the SPE threads in spu-top output, though the load and % seem to show them running. Is /spu mounted properly, and does it contain thread entries?

You could try running cellminer with --debug to get a bit more feedback on what’s happening with each thread.

devurandom commented 11 years ago

Yes, it is mounted, but no, it does not show anything:

# ls -la /spu
total 0
# mount | grep spu
none on /spu type spufs (rw,relatime)

--debug shows loads of these:

#<Bitcoin::SPUMiner:0x0000003792f428> Mining 4a000000..4bffffff
devurandom commented 11 years ago

After upgrading to Linux 3.8.8 and changing my kernel config as follows (based on thorough review and peeking at ps3_defconfig), I got up to 42Mh/s (from a previous 35Mh/s) with 2 PPEs and 16 SPEs. However, still only 2 CPU cores are being used instead of all 4.

--- config-3.8.6-aufs   2013-04-22 14:04:47.503032283 +0200
+++ config-3.8.8-aufs-r1        2013-04-22 11:24:42.209888283 +0200
@@ -1,6 +1,6 @@
 #
 # Automatically generated file; DO NOT EDIT.
-# Linux/powerpc 3.8.6-aufs Kernel Configuration
+# Linux/powerpc 3.8.8-aufs-r1 Kernel Configuration
 #
 CONFIG_PPC64=y

@@ -82,7 +82,8 @@
 CONFIG_SWAP=y
 CONFIG_SYSVIPC=y
 CONFIG_SYSVIPC_SYSCTL=y
-# CONFIG_POSIX_MQUEUE is not set
+CONFIG_POSIX_MQUEUE=y
+CONFIG_POSIX_MQUEUE_SYSCTL=y
 CONFIG_FHANDLE=y
 # CONFIG_AUDIT is not set
 CONFIG_HAVE_GENERIC_HARDIRQS=y
@@ -285,7 +286,9 @@
 CONFIG_PPC_IBM_CELL_RESETBUTTON=y
 CONFIG_CBE_THERM=y
 CONFIG_CBE_CPUFREQ=y
-# CONFIG_CBE_CPUFREQ_PMI_ENABLE is not set
+CONFIG_CBE_CPUFREQ_PMI_ENABLE=y
+CONFIG_CBE_CPUFREQ_PMI=y
+CONFIG_PPC_PMI=y
 CONFIG_CBE_CPUFREQ_SPU_GOVERNOR=y
 # CONFIG_PQ2ADS is not set
 # CONFIG_PPC_WSP is not set
@@ -357,14 +360,14 @@
 #
 # Kernel options
 #
-# CONFIG_HZ_100 is not set
+CONFIG_HZ_100=y
 # CONFIG_HZ_250 is not set
 # CONFIG_HZ_300 is not set
-CONFIG_HZ_1000=y
-CONFIG_HZ=1000
+# CONFIG_HZ_1000 is not set
+CONFIG_HZ=100
 CONFIG_SCHED_HRTICK=y
-# CONFIG_PREEMPT_NONE is not set
-CONFIG_PREEMPT_VOLUNTARY=y
+CONFIG_PREEMPT_NONE=y
+# CONFIG_PREEMPT_VOLUNTARY is not set
 # CONFIG_PREEMPT is not set
 CONFIG_BINFMT_ELF=y
 CONFIG_COMPAT_BINFMT_ELF=y
@@ -377,7 +380,7 @@
 CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
 CONFIG_ARCH_HAS_WALK_MEMORY=y
 CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
-# CONFIG_KEXEC is not set
+CONFIG_KEXEC=y
 # CONFIG_CRASH_DUMP is not set
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_NUMA=y
@@ -418,7 +421,7 @@
 CONFIG_PPC_4K_PAGES=y
 # CONFIG_PPC_64K_PAGES is not set
 CONFIG_FORCE_MAX_ZONEORDER=13
-# CONFIG_SCHED_SMT is not set
+CONFIG_SCHED_SMT=y
 CONFIG_PPC_DENORMALISATION=y
 # CONFIG_CMDLINE_BOOL is not set
 CONFIG_EXTRA_TARGETS=""
@@ -604,8 +607,8 @@
 #
 # Device Tree and Open Firmware support
 #
-# CONFIG_PROC_DEVICETREE is not set
-# CONFIG_OF_SELFTEST is not set
+CONFIG_PROC_DEVICETREE=y
+CONFIG_OF_SELFTEST=y
 CONFIG_OF_FLATTREE=y
 CONFIG_OF_EARLY_FLATTREE=y
 CONFIG_OF_ADDRESS=y
@@ -802,10 +805,8 @@
 # CONFIG_NET_VENDOR_QLOGIC is not set
 # CONFIG_NET_VENDOR_REALTEK is not set
 # CONFIG_NET_VENDOR_RDC is not set
-CONFIG_NET_VENDOR_SEEQ=y
-# CONFIG_SEEQ8005 is not set
-CONFIG_NET_VENDOR_SILAN=y
-# CONFIG_SC92031 is not set
+# CONFIG_NET_VENDOR_SEEQ is not set
+# CONFIG_NET_VENDOR_SILAN is not set
 # CONFIG_NET_VENDOR_SIS is not set
 # CONFIG_SFC is not set
 # CONFIG_NET_VENDOR_SMSC is not set
devurandom commented 11 years ago

With following patches to the Linux 3.8.8 kernel by the PPC/Cell maintainer Michael Ellerman michael@ellerman.id.au, I am now at a hashrate of 64Mh/s. All 4 CPU cores are being used and the spufs shows all threads. If you want my kernel config, please drop me a mail.

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index bba87ca..6a252c4 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -201,7 +201,7 @@ int __node_distance(int a, int b)
        int distance = LOCAL_DISTANCE;

        if (!form1_affinity)
-               return distance;
+               return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);

        for (i = 0; i < distance_ref_points_depth; i++) {
                if (distance_lookup_table[a][i] == distance_lookup_table[b][i])
diff --git a/arch/powerpc/platforms/cell/pmu.c b/arch/powerpc/platforms/cell/pmu.c
index 59c1a16..348a27b 100644
--- a/arch/powerpc/platforms/cell/pmu.c
+++ b/arch/powerpc/platforms/cell/pmu.c
@@ -382,7 +382,7 @@ static int __init cbe_init_pm_irq(void)
        unsigned int irq;
        int rc, node;

-       for_each_node(node) {
+       for_each_online_node(node) {
                irq = irq_create_mapping(NULL, IIC_IRQ_IOEX_PMI |
                                               (node << IIC_IRQ_NODE_SHIFT));
                if (irq == NO_IRQ) {
diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
index 3f3bb4c..35f77a4 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -99,6 +99,7 @@ spufs_new_inode(struct super_block *sb, umode_t mode)
        if (!inode)
                goto out;

+       inode->i_ino = get_next_ino();
        inode->i_mode = mode;
        inode->i_uid = current_fsuid();
        inode->i_gid = current_fsgid();
devurandom commented 11 years ago

While the hashrate cellminer measures is quite high, the work that actually arrives at Eligius is way smaller: About 25%.

Due to the doubled hashrate, I also doubled the queue size (from 256 with my recent patches, up to 512 now). This means the queues are almost empty at the beginning and only grow very slowly. But since they fill up to about max anyway, I decided it brings no benefit to increase them even further. Leaving them at 256 resulted in them becoming empty later on, so that was bad, too.

What I noticed, are a lot of these messages from cellminer:

Solved? (false)

Is that an expected message? What does it mean?

verement commented 11 years ago

It means the submitted solution was rejected by the server. Usually this can happen if the server has started on a new block, so all work on prior blocks becomes invalid. Long polling is supposed to help miners detect when a new block is started more quickly to reduce the frequency of these stale shares.

However, if you are seeing quite a lot of them, it may mean something else is wrong. (It may also explain why you are only seeing 25% of your hashrate being registered.)

Can you give some examples of the hashes being submitted, along with the target value printed when you first start running cellminer?

devurandom commented 11 years ago

It means the submitted solution was rejected by the server. Usually this can happen if the server has started on a new block, so all work on prior blocks becomes invalid. Long polling is supposed to help miners detect when a new block is started more quickly to reduce the frequency of these stale shares.

How does LONG_POLL_TIMEOUT affect this? I have never properly used Ruby, especially not the HTTP classes, but I understood it as the network connection timeout. Is that correct?

I am asking, because I reduced it to 5 minutes (60*5), because my network connection seems sometimes flakey - perhaps due to the NAT.

However, if you are seeing quite a lot of them, it may mean something else is wrong. (It may also explain why you are only seeing 25% of your hashrate being registered.)

Yes, I would say at least 50% of the hash= lines lack the star and "Solved?" is false.

Can you give some examples of the hashes being submitted, along with the target value printed when you first start running cellminer?

How do I make cellminer output into a non-terminal? It seems as soon as I try to redirect stdout into a file or pipe it somewhere, it will not output anything at all. (But it is certainly running.) I do get something after a long while when running in --debug mode, but then the output is obviously very noisy.

devurandom commented 11 years ago

Can you give some examples of the hashes being submitted, along with the target value printed when you first start running cellminer?

I created a log with --debug [1] and then grepped out the debug lines [2](grep -v). I hope that gives you some idea on what is going on. If you need further information, please just ask for it.

Also, if you know how to create non-debug logs, please tell me that, too.

[1] http://kynes.de/~dschridde/cellminer.debug.log [2] http://kynes.de/~dschridde/cellminer.log

grifits commented 6 years ago

Hello devurandom I have a request for help on this topic, how can I contact you?

devurandom commented 6 years ago

I have not used cellminer since 2013 and thus doubt I could be of help.