quantcast / qfs

Quantcast File System
https://quantcast.atlassian.net
Apache License 2.0
643 stars 169 forks source link

QFS and ARM 32bit platforms #213

Open valerioa opened 7 years ago

valerioa commented 7 years ago

While trying to run QFS on a cluster of ARMv7 boards (namely big.LITTLE cortex-a15.cortex-a7 ODROID XU4 board), I've discovered an issue that exists in running QFS on armv7 32 CPUs. The issue cannot be fixed by the QFS team, but, I'm recording it here for future gogglers, who might run into the same problem and might be looking for a solution. Feel free to close this issue once you've read it.

The issue is that the unmodified github source, compiled with a recent gcc on an ARM 32bit CPU will lead to a non working metaserver:

Vanilla compilation with gcc, will lead to a metaserver that, after correctly creating an empty filesystem,

./build/release/bin/metaserver -c /etc/qfs/Metaserver.prp
Loading key metaServer.clientPort with value 30000
Loading key metaServer.chunkServerPort with value 20000
Loading key metaServer.logDir with value /data/qfs/logs
Loading key metaServer.cpDir with value /data/qfs/checkpoint
Loading key metaServer.recoveryInterval with value 30
Loading key metaServer.clusterKey with value qfs-odroid-xu4
Loading key metaServer.rackPrefixes with value armhadoop1.home 1 armhadoop2.home 2 armhadoop3.home 3
Loading key metaServer.msgLogWriter.logLevel with value DEBUG
Loading key chunkServer.msgLogWriter.logLevel with value NOTICE
01-19-2017 09:46:43.111 INFO - (nofilelimit.cc:82) max # of open files: 65536
01-19-2017 09:46:43.112 INFO - (metaserver_main.cc:460) meta server client listner:  30000
01-19-2017 09:46:43.112 INFO - (metaserver_main.cc:479) meta server chunk server listener:  20000
01-19-2017 09:46:43.112 INFO - (metaserver_main.cc:488) path->fid cache disabled
01-19-2017 09:46:43.112 INFO - (metaserver_main.cc:351) min chunk servers that should connect: 1
01-19-2017 09:46:43.112 INFO - (metaserver_main.cc:359) min. # of replicas per file: 1
01-19-2017 09:46:43.112 INFO - (metaserver_main.cc:520) hard limits: open files: 65536 chunk servers: 4096 clients: 61184
01-19-2017 09:46:43.112 INFO - (LayoutManager.cc:1728) rack: prefix: armhadoop1.home id: 1
01-19-2017 09:46:43.112 INFO - (LayoutManager.cc:1728) rack: prefix: armhadoop2.home id: 2
01-19-2017 09:46:43.112 INFO - (LayoutManager.cc:1728) rack: prefix: armhadoop3.home id: 3
01-19-2017 09:46:43.112 INFO - (LayoutManager.cc:2060) max. response size: 89478485 minIoBufferBytesToProcessRequest: 89478485
01-19-2017 09:46:43.125 INFO - (LayoutManager.cc:2452) setting properties for 0 chunk servers: chunkServer.msgLogWriter.logLevel=NOTICE;
01-19-2017 09:46:43.126 INFO - (metaserver_main.cc:541) creating empty file system
01-19-2017 09:46:45.716 INFO - (metaserver_main.cc:692) failed to crete empty files system: checkpoint already exists: /data/qfs/checkpoint/latest

will never starts, complaining of a missing root directory, which is actually there

/build/release/bin/metaserver /etc/qfs/Metaserver.prp
Loading key metaServer.clientPort with value 30000
Loading key metaServer.chunkServerPort with value 20000
Loading key metaServer.logDir with value /data/qfs/logs
Loading key metaServer.cpDir with value /data/qfs/checkpoint
Loading key metaServer.recoveryInterval with value 30
Loading key metaServer.clusterKey with value qfs-odroid-xu4
Loading key metaServer.rackPrefixes with value armhadoop1.home 1 armhadoop2.home 2 armhadoop3.home 3
Loading key metaServer.msgLogWriter.logLevel with value DEBUG
Loading key chunkServer.msgLogWriter.logLevel with value NOTICE
01-19-2017 09:46:53.579 INFO - (nofilelimit.cc:82) max # of open files: 65536
01-19-2017 09:46:53.580 INFO - (metaserver_main.cc:460) meta server client listner:  30000
01-19-2017 09:46:53.580 INFO - (metaserver_main.cc:479) meta server chunk server listener:  20000
01-19-2017 09:46:53.580 INFO - (metaserver_main.cc:488) path->fid cache disabled
01-19-2017 09:46:53.580 INFO - (metaserver_main.cc:351) min chunk servers that should connect: 1
01-19-2017 09:46:53.580 INFO - (metaserver_main.cc:359) min. # of replicas per file: 1
01-19-2017 09:46:53.580 INFO - (metaserver_main.cc:520) hard limits: open files: 65536 chunk servers: 4096 clients: 61184
01-19-2017 09:46:53.580 INFO - (LayoutManager.cc:1728) rack: prefix: armhadoop1.home id: 1
01-19-2017 09:46:53.580 INFO - (LayoutManager.cc:1728) rack: prefix: armhadoop2.home id: 2
01-19-2017 09:46:53.580 INFO - (LayoutManager.cc:1728) rack: prefix: armhadoop3.home id: 3
01-19-2017 09:46:53.580 INFO - (LayoutManager.cc:2060) max. response size: 89478485 minIoBufferBytesToProcessRequest: 89478485
01-19-2017 09:46:53.597 INFO - (LayoutManager.cc:2452) setting properties for 0 chunk servers: chunkServer.msgLogWriter.logLevel=NOTICE;
01-19-2017 09:46:53.597 INFO - (metaserver_main.cc:541) starting metaserver
01-19-2017 09:46:53.651 INFO - (Restorer.cc:99) restoring from checkpoint of 2017-01-08T23:51:08.220838Z
01-19-2017 09:46:53.651 INFO - (Replay.cc:73) open log file: /data/qfs/logs/log.0
01-19-2017 09:46:53.653 INFO - (Restorer.cc:99) restoring from checkpoint of 2017-01-08T23:51:08.220881Z
01-19-2017 09:46:53.653 FATAL - (Restorer.cc:582) /data/qfs/checkpoint/latest: invalid or missing root directory
01-19-2017 09:46:53.654 FATAL - (metaserver_main.cc:724) checkpoint load failed: Input/output error 5

It took me quite an amount of digging through source code, in order to find out that the problem was caused here, in src/cc/meta/meta.h:

    static inline KeyData nameHash(const string& name)
    {
        // Key(t,d1,d2) discards d2 low order bits.
        // The hash is 32 bit. Storing in MetaDentry 32 instead of 64
        // bit hash currently doesn't save anything due to alignment
        // with 64 bit compile.
        Hsieh_hash_fcn f;
        return ((KeyData)(f(name)) << 4);
}

This function calculates hash codes for directory and file names. Hash codes are used to find files and directories in the meta tree. In the example above, the root directory "/" is not found because nameHash() returns the wrong hash code.

Now, the data type returned by the hashing function f(name) is an unsigned 32 bit quantity that is promoted to signed 64 bit quantity (the data type ofKeyData is int64_t) and eventually left shifted by 4 bits.

Here's the rub: gcc 4.9 (maybe earlier than that) through 6.3 has a bug. QFS is compiled with the default cmake flags of "-O2 -DNDEBUG". In trying to optimize this line:

        return ((KeyData)(f(name)) << 4);

gcc fuses the two operations (promotion to a larger int type and left shift) in a single group of assembler instructions, for speed. But, there is a bug that will insert the wrong armv7 asm instruction (an arithmetic right shift, instead of a left shift). This bug has been there for quite some time. I guess the c/c++ construct of having a promotion to a larger integer + left shift (close enough to be optimized by gcc) must not be very common in source code, because the bug has not been reported or observed for at least two years.

I've duly reported the bug to the gcc team.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79121

It has already been fixed for gcc 7.0 (the bug applies to arm 32 and mips when promoting from 32 to 64 bit quantity, and to arm64 when promoting from 64 to 128)

I don't know if and when it will back-ported to gcc 6.3 and earlier.

There are three workarounds for those who want to run QFS on arm 32 bit

  1. Compile everything without any optimization. gcc will create the correct asm code without optimization

  2. Compile with clang++/llvm

  3. split promotion to larger int and left shift and add the "volatile" keyword to the KeyData assignment; vanilla compile without further editing (in src/cc/meta/meta.h)

        Hsieh_hash_fcn f;
        KeyData volatile kd = f(name);
        return (kd << 4);

volatile will instruct gcc not to make the KeyData assignment part of any optimization, thus promotion and left shift will not be fused into the same cluster of asm instructions - no bug triggered.

The last one is by far the simplest solution.

Future arm 32 googlers, you're welcome.

michaelkamprath commented 6 years ago

@valerioa Thanks! I was just struggling with this one myself.