openvstorage / alba

Open vStorage ALBA (alternate backend) creates a replicated or flexible network raid’ed object storage backend out of Seagate Kinetic drives and local disk supporting compression, encryption.
Other
28 stars 10 forks source link

disqualified OSDs cause NoSatisfiablePolicy #797

Closed toolslive closed 6 years ago

toolslive commented 7 years ago

root cause seems to be this: Invalid_argument String.blit which is suspect enough by itself

Aug 22 01:57:44 NY1SRV0006 alba[15075]: 2017-08-22 01:57:44 003903 -0400 - NY1SRV0006 - 15075/0000 - alba/proxy - 4120 - error - Disqualifying osd 0: (Invalid_argument String.blit)
Aug 22 01:58:15 NY1SRV0006 alba[23049]: 2017-08-22 01:58:15 834823 -0400 - NY1SRV0006 - 23049/0000 - alba/proxy - 1304 - info - "(Invalid_argument String.blit)" was unforeseen, invalidating pool
Aug 22 01:58:15 NY1SRV0006 alba[23049]: 2017-08-22 01:58:15 834848 -0400 - NY1SRV0006 - 23049/0000 - alba/proxy - 1305 - info - "(Invalid_argument String.blit)": should_invalidate:true should_retry:false
Aug 22 01:58:15 NY1SRV0006 alba[23049]: 2017-08-22 01:58:15 834905 -0400 - NY1SRV0006 - 23049/0000 - alba/proxy - 1306 - info - "(Invalid_argument String.blit)" was unforeseen, invalidating pool
toolslive commented 7 years ago

Basically any unexpected exception coming from a local backend that is used as an OSD in a global backend will cause that local backend to be disqualified. For example, a master switch of a nsm will show itself as:

alba/proxy - 4146140 - info - a3926642-e2cc-4627-9c02-0d1f880e01fc "Client_helper.MasterLookupResult.Error(0)" was unforeseen, invalidating pool
alba/proxy - 4146141 - info - a3926642-e2cc-4627-9c02-0d1f880e01fc "Client_helper.MasterLookupResult.Error(0)": should_invalidate:true should_retry:false
alba/proxy - 4146142 - error - Disqualifying osd 0 a3926642-e2cc-4627-9c02-0d1f880e01fc : Client_helper.MasterLookupResult.Error(0)

Since the OSD is disqualified, the global backend can get into trouble. In this particular case, the arakoon had no real issue as the master switch was triggered by drop-master.

wimpers commented 7 years ago

@toolslive , is this related to ticket https://github.com/openvstorage/alba/issues/550 or can we fix this one while at it?

toolslive commented 7 years ago

It's probably related. Also, the more namespace managers you have for the local backend, the more likely you are to run into this.

wimpers commented 6 years ago

@toolslive is this fixed in https://github.com/openvstorage/alba_ee/releases/tag/1.5.16

toolslive commented 6 years ago

In essence, "disqualified OSDs cause NoSatisfiablePolicy" is not a bug. The bug here is that there were plenty of simple scenarios which cause the OSD to be disqualified while it shouldn't. We fixed a number of these cases.

wimpers commented 6 years ago

From @toolslive :I would close this one. I'll open another whenever the remaining cases (re)surface

Hence closing down this one.