VDisk: halted & No node config found

yongshengma commented 5 years ago

Hello,

A vdisk is halted and cannot start. Its status shows Halted and DTL shows DTL location optimization possible.

In [1]: from ovs.dal.hybrids.vdisk import VDisk

In [2]: vdisk = VDisk('1a3992f8-06e4-44a5-8344-466f90a0b4b9')

In [3]: vdisk.info['live_status']
Out[3]: 'HALTED'

In [4]: vdisk.storagedriver_client.restart_object(str(vdisk.volume_id), False)

In [5]:

No change on status and still halted.

Grep its volume id:

May 11 13:56:02 N01 volumedriver_fs.sh: 2019-05-11 13:56:02 808638 +0800 - N01 - 13915/0x00007fcc957f0700 - volumedriverfs/XMLRPCTimingWrapper - 000000000271458a - info - execute: Arguments for volumePerformanceCountersV3 are {[reset:false,volume_id:344e5323-e3e3-401e-a47e-3827b60bd283,vrouter_cluster_id:bd3baf59-72a0-41d7-8a9e-61657baf9880]}
May 11 13:56:02 N01 volumedriver_fs.sh: 2019-05-11 13:56:02 808752 +0800 - N01 - 13915/0x00007fcc957f0700 - volumedriverfs/XMLRPCRedirectWrapper - 000000000271458b - warning - execute: Object 344e5323-e3e3-401e-a47e-3827b60bd283 is halted on this node, figuring out if there's a healthy instance elsewhere
May 11 13:56:02 N01 volumedriver_fs.sh: 2019-05-11 13:56:02 809028 +0800 - N01 - 13915/0x00007fcc957f0700 - volumedriverfs/XMLRPCRedirectWrapper - 000000000271458c - error - execute: No node config found for 344e5323-e3e3-401e-a47e-3827b60bd283
May 11 13:56:02 N01 volumedriver_fs.sh: std::exception::what: volume instance is halted 344e5323-e3e3-401e-a47e-3827b60bd283
May 11 13:56:02 N01 gunicorn: 2019-05-11 07:56:02 81000 +0200 - N01 - 12274/139658000077200 - hybrids/vdisk.py - fetch_statistics - 3051 - ERROR - Error loading statistics_volume from 344e5323-e3e3-401e-a47e-3827b60bd28: volume instance is halted 344e5323-e3e3-401e-a47e-3827b60bd283

These lines keep on popping out...

There's error some time ago. I'm not sure if it is a cause:

May 11 04:00:12 N01 alba: 2019-05-11 04:00:12 433632 +0800 - N01 - 13902/0000 - alba/proxy - 338702 - error - Request ApplySequence ("344e5323-e3e3-401e-a47e-3827b60bd283",false,[(Nsm_model.Assert.ObjectHasChecksum ("owner_tag",;    Sha1 9e44d2771c052d44058245eda6cb334689ca78cc));  ],[(Proxy_protocol.Protocol.Update.UploadObjectFromFile;    ("00_0000222d_00",;     "/mnt/hdd1/pool_write_sco_1/344e5323-e3e3-401e-a47e-3827b60bd283/00_0000222d_00",;     (Some Crc32c 0x2b9d6024)));  ]) errored and took 0.050547
May 11 04:00:12 N01 alba: 2019-05-11 04:00:12 483711 +0800 - N01 - 13902/0000 - alba/proxy - 338706 - error - Request ApplySequence ("344e5323-e3e3-401e-a47e-3827b60bd283",false,[(Nsm_model.Assert.ObjectHasChecksum ("owner_tag",;    Sha1 9e44d2771c052d44058245eda6cb334689ca78cc));  ],[(Proxy_protocol.Protocol.Update.UploadObjectFromFile;    ("00_0000222d_00",;     "/mnt/hdd1/pool_write_sco_1/344e5323-e3e3-401e-a47e-3827b60bd283/00_0000222d_00",;     (Some Crc32c 0x2b9d6024)));  ]) errored and took 0.049368

Any clue to start this vdisk?

Context of this issue: backend was used up with 260G left (1%) and 26T totally.

Best regards, Yongsheng

yongshengma commented 5 years ago

Is this log telling backend or asd is full? May 9 11:12:29 N05 alba: 2019-05-09 11:12:29 047136 +0800 - N05 - 2705/0000 - alba/asd - 1127264 - info - returning error Asd_protocol.Protocol.Error.Full

yongshengma commented 5 years ago

I tried to restart a halted vidsk and the log shows finished, but it actually not.

May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 210177 +0800 - N05 - 2822/0x00007eff83c7d700 - volumedriverfs/MTServer - 0000000004ebde01 - info - work: You have 2 connections running
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 210225 +0800 - N05 - 2822/0x00007eff83c7d700 - volumedriverfs/MTServer - 0000000004ebde02 - info - cleanup_threads_locked: Trying to join 2 threads
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 210313 +0800 - N05 - 2822/0x00007f08aeffd700 - volumedriverfs/XMLRPCTimingWrapper - 0000000004ebde03 - info - execute: Arguments for volumeInfo are {[redirect_fenced:true,volume_id:3539fb79-0067-4ec4-a49c-85d590de62ea,vrouter_cluster_id:bd3baf59-72a0-41d7-8a9e-61657baf9880]}
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 210387 +0800 - N05 - 2822/0x00007f08aeffd700 - volumedriverfs/MDSMetaDataStore - 0000000004ebde04 - error - do_handle_: 3539fb79-0067-4ec4-a49c-85d590de62ea: no internal MetaDataStore present - all previous MDS failover attempts failed?
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 210481 +0800 - N05 - 2822/0x00007f08aeffd700 - volumedriverfs/XMLRPCTimingWrapper - 0000000004ebde05 - error - operator(): Failed to retrieve metadata store stats: No internal mdstore present 3539fb79-0067-4ec4-a49c-85d590de62ea - ignored
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 211133 +0800 - N05 - 2822/0x00007f08aeffd700 - volumedriverfs/XMLRPCTimingWrapper - 0000000004ebde06 - info - execute: Call volumeInfo took 0.000768 seconds
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 211244 +0800 - N05 - 2822/0x00007f08aeffd700 - volumedriverfs/XMLRPCConnection - 0000000004ebde07 - info - operator(): Closing socket 332
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 213884 +0800 - N05 - 2822/0x00007eff83c7d700 - volumedriverfs/MTServer - 0000000004ebde08 - info - work: You have 2 connections running
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 213930 +0800 - N05 - 2822/0x00007eff83c7d700 - volumedriverfs/MTServer - 0000000004ebde09 - info - cleanup_threads_locked: Trying to join 2 threads
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 213978 +0800 - N05 - 2822/0x00007f08b4ff9700 - volumedriverfs/XMLRPCTimingWrapper - 0000000004ebde0a - info - execute: Arguments for restartVolume are {[force:false,volume_id:3539fb79-0067-4ec4-a49c-85d590de62ea,vrouter_cluster_id:bd3baf59-72a0-41d7-8a9e-61657baf9880]}
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 214358 +0800 - N05 - 2822/0x00007f08b4ff9700 - volumedriverfs/VFSLocalNode - 0000000004ebde0b - warning - local_restart_volume_: 3539fb79-0067-4ec4-a49c-85d590de62ea already running
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 214604 +0800 - N05 - 2822/0x00007f08b4ff9700 - volumedriverfs/XMLRPCTimingWrapper - 0000000004ebde0c - info - execute: Call restartVolume took 0.000579 seconds
May 14 10:34:34 N05 volumedriver_fs.sh: 2019-05-14 10:34:34 214685 +0800 - N05 - 2822/0x00007f08b4ff9700 - volumedriverfs/XMLRPCConnection - 0000000004ebde0d - info - operator(): Closing socket 332
May 14 10:34:34 N05 celery: 2019-05-14 10:34:34 23600 +0800 - N05 - 5286/140549621679936 - celery/job.py - on_success - 157014 - INFO - Task ovs.vdisk.restart[f601a621-5899-409a-b820-88e376b0da18] succeeded in 0.0408876431175s: None

redlicha commented 5 years ago

A volume instance in halted state is still present but does not accept I/O anymore. This state is entered after a fatal error or after HA forcefully started the volume elsewhere due to a network partition. The restart_object starts up volumes that are not present. To get the volume out of that state, a stop_object call has to be issued first (FWIW, these are the names of the calls in the volumedriver API).

toolslive commented 5 years ago

Asd_protocol.Protocol.Error.Full means that a particular ASD is full. This will result in failing uploads of fragments. If only a few ASDs are full, it will not cause problems. If they all are full, it will mean you cannot write anymore (or repair).

What does alba list-osds --config ... say?

yongshengma commented 5 years ago

Hi @redlicha

All halted volumes are started up after being stopped first. Thanks!

Best regards

yongshengma commented 5 years ago

Hi @toolslive

Last week the used space of backend reached 99%. I also checked 'df -h' and saw every mounted devices under /mnt/alba-asd/ on every node had reached 99%. The used space dropped to 32% after a big snapshot (5.80T) was removed.

Using alba list-osds --config ... , I'm able to see detailed statistics. The ratio converted from used and total is 32% now for every ASD. Thanks for this cmd.

Two snippets:

5 : { Nsm_model.OsdInfo.kind =
  (Nsm_model.OsdInfo.Asd ((["10.20.107.48"], 8601, false, false),
     "65NTHVAujRqCKjji2JPKfu7mZlRTJ9Hb"));
  decommissioned = false; node_id = "tiVrMBrphrBrn6Qv"; other = "other?";
  total = 1199110983680L; used = 356381862657L;
  seen = [2019/01/10_16:26:10.5534];
  read =
  [2019/05/15_11:49:0.5546; 2019/05/15_11:48:40.1238;
    2019/05/15_11:48:25.5416; 2019/05/15_11:48:22.7101;
    2019/05/15_11:47:50.1931; 2019/05/15_11:46:58.2712;
    2019/05/15_11:46:47.1286; 2019/05/15_11:46:36.5023;
    2019/05/15_11:45:43.7541; 2019/05/15_11:45:26.7668];
  write =
  [2019/05/15_11:48:23.4295; 2019/05/15_11:48:6.7567;
    2019/05/15_11:46:40.1592; 2019/05/15_11:46:27.7731;
    2019/05/15_11:46:23.8440; 2019/05/15_11:46:9.2203;
    2019/05/15_11:46:8.9008; 2019/05/15_11:46:4.4235;
    2019/05/15_11:46:4.1538; 2019/05/15_11:46:3.0537];
  errors =
  [(2019/05/11_08:40:5.6793, "Lwt.Canceled");
    (2019/05/11_08:40:5.6789, "Lwt.Canceled");
    (2019/05/11_08:39:23.3614, "Lwt.Canceled");
    (2019/05/11_08:39:22.4070, "Lwt.Canceled");
    (2019/05/11_08:38:24.0517, "Lwt.Canceled");
    (2019/05/11_08:38:23.9245, "Lwt.Canceled");
    (2019/05/11_08:38:23.5558, "Lwt.Canceled");
    (2019/05/11_08:38:23.2138, "Lwt.Canceled");
    (2019/05/11_08:38:22.9788, "Lwt.Canceled");
    (2019/05/11_08:38:22.9787, "Lwt.Canceled")];
  checksum_errors = 0L; claimed_since = (Some 2019/01/10_16:26:9.8235) }

...

23 : { Nsm_model.OsdInfo.kind =
  (Nsm_model.OsdInfo.Asd ((["10.20.107.80"], 8600, false, false),
     "U78qHgYN9GJdrOIy8LwVu74gYmqv5s3m"));
  decommissioned = false; node_id = "SykSdKjhdNGgJe2d"; other = "other?";
  total = 1198410878976L; used = 380962453212L;
  seen = [2019/02/14_17:40:46.3412];
  read =
  [2019/05/15_11:49:15.6229; 2019/05/15_11:49:4.6534;
    2019/05/15_11:48:57.4087; 2019/05/15_11:48:24.7683;
    2019/05/15_11:48:22.7118; 2019/05/15_11:48:18.4301;
    2019/05/15_11:47:48.4083; 2019/05/15_11:47:41.7724;
    2019/05/15_11:47:36.7559; 2019/05/15_11:47:33.5664];
  write =
  [2019/05/15_11:48:6.4994; 2019/05/15_11:47:23.0258;
    2019/05/15_11:47:17.7167; 2019/05/15_11:46:40.2072;
    2019/05/15_11:46:27.7728; 2019/05/15_11:46:10.7085;
    2019/05/15_11:46:9.2879; 2019/05/15_11:46:8.8242;
    2019/05/15_11:46:7.8227; 2019/05/15_11:46:4.3758];
  errors =
  [(2019/04/26_18:39:16.9345,
    "(Unix.Unix_error \"Connection refused\" connect \"\")");
    (2019/04/26_18:39:12.4381,
     "(Unix.Unix_error \"Connection refused\" connect \"\")");
    (2019/04/26_18:39:8.8161,
     "(Unix.Unix_error \"No route to host\" connect \"\")");
    (2019/04/26_18:39:5.7900,
     "(Unix.Unix_error \"No route to host\" connect \"\")");
    (2019/04/26_18:39:5.7888, "Networking2.ConnectTimeout");
    (2019/04/26_18:39:3.3795, "Networking2.ConnectTimeout");
    (2019/04/26_18:39:2.7775, "Networking2.ConnectTimeout");
    (2019/04/26_18:38:58.3335,
     "(Unix.Unix_error \"Connection refused\" connect \"\")");
    (2019/04/26_18:38:56.3672,
     "(Unix.Unix_error \"Connection refused\" connect \"\")");
    (2019/04/26_18:38:55.3317, "Networking2.ConnectTimeout")];
  checksum_errors = 0L; claimed_since = (Some 2019/02/14_17:40:47.8028) }

The snapshots occupy extra space besides volume's itself, don't they? 3x too if 3-way preset chosen?

Best regards

toolslive commented 5 years ago

If you write to your volume, the LBAs will be packed in containers of (lets say) 64MB. These containers are uploaded to ALBA (the object store in OVS). If you overwrite the data, the older data (and thus the containters it resides in) are kept until the snapshot referencing that data is deleted. So deleting of snapshots and scrubbing will keep the used space in check.

To get some insight on the data multipliers between the volume and the backend you find that during upload the preset's policy for that volume is applied. The policy could be 3-way replication (times 3), it could also be (9,5)-erasure coding (times (9+5)/9 ) or something else (compression can play a role too). You can check the policies with alba list-presets.

To see the data used via the cli, you can do the following

$> alba list-namespaces  --config ... 
$> alba show-namespace <namespace-name> --config ...

Tip: most of these alba command also have a --to-json flag so you can parse the output into jq (for example)

yongshengma commented 5 years ago

Appreciate!

openvstorage / framework

VDisk: halted & No node config found #2300