quantcast / qfs

Quantcast File System
https://quantcast.atlassian.net
Apache License 2.0
643 stars 171 forks source link

how to retire a chunkserver? #260

Closed damonbreeden closed 1 year ago

damonbreeden commented 1 year ago

hi, we have a chunkserver that has died and we do not plan to replace it however the meta server still lists that server in its status and shows it dead

Chunk servers | : | alive: 924 dead: 2 retiring: 0  hibernated: 0 -- | -- | --

i can't find any sort of ttl or timeout for chunkservers aliveness, or any command that might help me rm this server from the env short of restarting the metaserver, is there anything i can do to rm this chunkserver from the environment?

mikeov commented 1 year ago

This number of "dead" chunk servers is a count of down / disconnect server events in the recent history that have no corresponding up / re-connect events.

The maximum size of the history is determined by parameter metaServer.maxDownServersHistorySize.

To reset the history set the parameter to 0, then send HUP signal meta server process, and then set parameter back to the prior value (4096 is the default), and then send HUP signal again. Commenting out parameters will not work until meta server restart, as doing so will leave the parameter at 0, therefore the parameter must be set explicitly back to prior or default value.

damonbreeden commented 1 year ago

thx for the quick response, this worked perfectly

sudo sed -i 's/metaServer.maxDownServersHistorySize.*/metaServer.maxDownServersHistorySize = 0/' qfs/meta/config.prp;\ 
sudo kill -s SIGHUP $(systemctl show --property MainPID --value qfsmetaserver.service); \
sudo sed -i 's/metaServer.maxDownServersHistorySize.*/metaServer.maxDownServersHistorySize = 4096/' qfs/meta/config.prp; \
sudo kill -s SIGHUP $(systemctl show --property MainPID --value qfsmetaserver.service)