stanford-mast / pocket

Elastic ephemeral storage
118 stars 28 forks source link

Crail datanode error #15

Closed Ives66 closed 4 years ago

Ives66 commented 4 years ago

When I'm deploying crail on master(datanode), it always shows

$ ~/crail./bin/crail datanodeil
20/03/12 10:42:02 INFO crail: crail.version 3101
20/03/12 10:42:02 INFO crail: crail.directorydepth 16
20/03/12 10:42:02 INFO crail: crail.tokenexpiration 10
20/03/12 10:42:02 INFO crail: crail.blocksize 1024000
20/03/12 10:42:02 INFO crail: crail.cachelimit 10240000
20/03/12 10:42:02 INFO crail: crail.cachepath /dev/hugepages/cache
20/03/12 10:42:02 INFO crail: crail.user crail
20/03/12 10:42:02 INFO crail: crail.shadowreplication 1
20/03/12 10:42:02 INFO crail: crail.debug false
20/03/12 10:42:02 INFO crail: crail.statistics true
20/03/12 10:42:02 INFO crail: crail.rpctimeout 1000
20/03/12 10:42:02 INFO crail: crail.datatimeout 1000
20/03/12 10:42:02 INFO crail: crail.buffersize 1024000
20/03/12 10:42:02 INFO crail: crail.slicesize 512000
20/03/12 10:42:02 INFO crail: crail.singleton true
20/03/12 10:42:02 INFO crail: crail.regionsize 102400000
20/03/12 10:42:02 INFO crail: crail.directoryrecord 512
20/03/12 10:42:02 INFO crail: crail.directoryrandomize true
20/03/12 10:42:02 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache
20/03/12 10:42:02 INFO crail: crail.locationmap 
20/03/12 10:42:02 INFO crail: crail.namenode.address crail://10.1.0.10:9060
20/03/12 10:42:02 INFO crail: crail.namenode.blockselection roundrobin
20/03/12 10:42:02 INFO crail: crail.namenode.fileblocks 16
20/03/12 10:42:02 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode
20/03/12 10:42:02 INFO crail: crail.namenode.log 
20/03/12 10:42:02 INFO crail: crail.namenode.replayregion false
20/03/12 10:42:02 INFO crail: crail.storage.types org.apache.crail.storage.tcp.TcpStorageTier
20/03/12 10:42:02 INFO crail: crail.storage.classes 1
20/03/12 10:42:02 INFO crail: crail.storage.rootclass 0
20/03/12 10:42:02 INFO crail: crail.storage.keepalive 2
20/03/12 10:42:02 INFO crail: crail.client.blockcache.enable false
20/03/12 10:42:02 INFO narpc: new NaRPC server group v1.0, queueDepth 16, messageSize 2048000, nodealy false, cores 1
20/03/12 10:42:02 INFO crail: crail.storage.tcp.interface eth0
20/03/12 10:42:02 INFO crail: crail.storage.tcp.port 50020
20/03/12 10:42:02 INFO crail: crail.storage.tcp.storagelimit 10240000
20/03/12 10:42:02 INFO crail: crail.storage.tcp.allocationsize 102400000
20/03/12 10:42:02 INFO crail: crail.storage.tcp.datapath /dev/hugepages/data
20/03/12 10:42:02 INFO crail: crail.storage.tcp.queuedepth 16
20/03/12 10:42:02 INFO crail: crail.storage.tcp.cores 1
20/03/12 10:42:02 INFO crail: crail.storage.tcp.nodelay false
20/03/12 10:42:02 INFO crail: crail.storage.tcp.populatemmap false
20/03/12 10:42:02 INFO crail: running TCP storage server, address /10.1.94.143:50020
20/03/12 10:42:02 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true
20/03/12 10:42:02 INFO crail: crail.namenode.tcp.queueDepth 32
20/03/12 10:42:02 INFO crail: crail.namenode.tcp.messageSize 512
20/03/12 10:42:02 INFO crail: crail.namenode.tcp.cores 1
20/03/12 10:42:02 INFO crail: connected to namenode(s) /10.1.0.10:9060
Exception in thread "main" java.lang.Exception: Error returned in the RPC type: ERROR: Data node not registered
    at org.apache.crail.storage.StorageRpcClient.getDataNode(StorageRpcClient.java:75)
    at org.apache.crail.storage.StorageServer.main(StorageServer.java:177)

and namenode also shows

20/03/12 10:42:02 INFO crail: A new connection arrives from : /10.1.94.143:16698
20/03/12 10:42:02 INFO crail: new connection from /10.1.94.143:16698
20/03/12 10:42:02 INFO narpc: adding new channel to selector, from /10.1.94.143:16698
 Datanode no longer registered 

How can i fix this problem?

Ives66 commented 4 years ago

crail config

crail.blocksize                     1024000
crail.buffersize                    1024000
crail.slicesize                     512000
crail.regionsize                    102400000
crail.namenode.address            crail://10.1.0.10:9060
crail.cachepath                   /dev/hugepages/cache
crail.cachelimit                  10240000
crail.storage.tcp.interface       eth0
crail.storage.tcp.datapath        /dev/hugepages/data
crail.storage.tcp.storagelimit    10240000
crail.storage.rdma.interface         eth0
crail.storage.rdma.datapath          /memory/data
crail.storage.rdma.allocationsize    10240000
crail.storage.rdma.storagelimit      10240000000
crail.storage.rdma.localmap          true
crail.storage.rdma.indexpath         /index
anakli commented 4 years ago

I have not seen this issue before. This error happens before you start trying to write/read data from a client? Are you using the Pocket docker images or compiling from source?

Ives66 commented 4 years ago

Oh I am so foolish.