openvstorage / integrationtests

Open vStorage automated integration tests.
Other
0 stars 1 forks source link

Unevenly distributed data stored on 3 node test setup #390

Open pploegaert opened 7 years ago

pploegaert commented 7 years ago

As observed on Fattwin environment http://confluence.cloudfounders.com/pages/viewpage.action?pageId=66617797

Credits to @dejonghb

1 / 3 nodes has a lot less data stored at the end of the test. Node 1 and 3 17 GiB Node 2 has only 11 Gib

System parameters show difference in

To be investigated further ...

@dejonghb : pls log your findings in this ticket

dejonghb commented 7 years ago

fio test 80% read/20% write with runtime=480 & ramp_time=30; 44 vdisks on each storage node; fio from 4 nodes over the edge:

node1: 42.65 GiB node2: 20.21 GiB node3: 38.49 GiB

testrun directly after reboot, so all services were fresh. behaviour already present yesterday, so it does not appear to be transient.

domsj commented 7 years ago

This could be due to the fragment placement bug in alba < 1.3.1: https://github.com/openvstorage/alba/issues/523. Which alba are you using? What policy is in the preset? Looking at the data distribution I would guess k+m=5 in the active policy ... (although the wiki page says it's 8,0 - in which case I would expect the data distribution over the different nodes to be more like 3/3/2 compared to the 2/2/1 you see now). Please verify whether upgrading to alba 1.3.1 fixes the problem.

dejonghb commented 7 years ago

Already using alba 1.3.1 (aka latest greatest -- system has been rebooted inbetween testruns without changes in the result, so it cannot be a hangover/leftover) Setup is on 10.100.186.32/33/34 if you want to check things. Preset used for storage is (12, 4, 16, 6)

toolslive commented 7 years ago

looked at the distribution over the osds for a random namespace f976ca3e-6775-452c-952c-488d318d6ac7 not a lot there, so I dump it all here:

00_00000001_00 [27, 18, 73, 35, 31, 83, 2, 12, 56, 30, 60, 13, 47, 32, 71, 8]
00_00000002_00 [67, 31, 81, 16, 18, 82, 34, 12, 64, 47, 75, 28, 25, 53, 4, 80]
00_00000003_00 [43, 73, 23, 30, 12, 34, 66, 25, 42, 68, 29, 21, 63, 19, 33, 71]
00_00000004_00 [51, 17, 60, 52, 31, 23, 80, 82, 12, 41, 74, 27, 46, 54, 14, 73]
00_00000005_00 [75, 53, 74, 23, 8, 67, 54, 51, 78, 12, 34, 10, 62, 2, 44, 80]
00_00000006_00 [54, 31, 24, 60, 80, 7, 28, 42, 26, 67, 20, 52, 59, 41, 4, 73]
00_00000007_00 [73, 41, 82, 4, 63, 33, 7, 35, 13, 78, 51, 19, 75, 70, 24, 48]
00_00000008_00 [18, 29, 1, 81, 56, 49, 13, 35, 80, 25, 69, 38, 19, 82, 46, 21]
00_00000009_00 [12, 83, 46, 2, 67, 47, 26, 81, 32, 8, 44, 76, 18, 35, 7, 70]
00_0000000a_00 [52, 33, 77, 4, 20, 79, 54, 62, 14, 44, 29, 56, 25, 13, 51, 83]
00_0000000b_00 [49, 29, 65, 14, 26, 73, 32, 51, 1, 83, 28, 22, 58, 21, 52, 57]
00_0000000c_00 [28, 79, 9, 41, 10, 69, 52, 23, 36, 56, 3, 75, 44, 68, 22, 32]
00_0000000d_00 [7, 54, 19, 83, 16, 52, 71, 26, 82, 48, 53, 66, 13, 44, 73, 18]
00_0000000e_00 [21, 14, 32, 68, 63, 23, 41, 42, 20, 65, 28, 22, 58, 74, 43, 27]
00_0000000f_00 [64, 66, 38, 1, 40, 15, 73, 16, 79, 28, 51, 2, 68, 49, 18, 70]
00_00000010_00 [68, 34, 69, 22, 49, 14, 63, 19, 48, 56, 3, 81, 31, 24, 44, 64]
00_00000011_00 [58, 48, 80, 24, 66, 17, 37, 38, 78, 26, 34, 25, 69, 3, 46, 67]
00_00000012_00 [58, 21, 59, 38, 3, 51, 74, 40, 12, 71, 64, 44, 24, 42, 2, 70]
00_00000013_00 [16, 46, 25, 81, 73, 51, 20, 53, 4, 77, 2, 28, 75, 37, 24, 64]
00_00000014_00 [31, 15, 38, 71, 2, 69, 32, 81, 19, 33, 47, 62, 9, 21, 43, 80]
00_00000015_00 [13, 58, 34, 15, 23, 37, 56, 25, 81, 33, 65, 9, 44, 3, 53, 67]
00_00000016_00 [2, 18, 58, 41, 56, 51, 23, 70, 53, 17, 7, 81, 33, 46, 20, 72]
00_00000017_00 [63, 48, 10, 82, 26, 37, 58, 76, 43, 17, 52, 78, 2, 7, 59, 44]
00_00000018_00 [75, 41, 74, 27, 19, 62, 40, 70, 32, 12, 48, 66, 20, 57, 53, 1]
00_00000019_00 [51, 34, 69, 10, 32, 61, 20, 59, 52, 19, 3, 46, 63, 9, 79, 54]
00_0000001a_00 [79, 15, 34, 56, 26, 69, 52, 60, 44, 17, 53, 22, 66, 27, 48, 58]
00_0000001b_00 [59, 58, 29, 16, 57, 22, 38, 68, 4, 34, 54, 27, 73, 69, 13, 43]
00_0000001c_00 [2, 48, 68, 9, 26, 31, 75, 83, 17, 52, 3, 38, 65, 82, 36, 24]
00_0000001d_00 [42, 56, 28, 14, 26, 30, 59, 75, 12, 29, 51, 79, 13, 15, 47, 78]
00_0000001e_00 [15, 34, 17, 61, 19, 68, 30, 22, 35, 81, 49, 76, 13, 12, 52, 75]
00_0000001f_00 [80, 10, 56, 36, 52, 73, 7, 35, 22, 67, 30, 82, 17, 61, 41, 8]
00_00000020_00 [53, 63, 32, 26, 12, 72, 40, 9, 80, 51, 16, 34, 71, 42, 7, 58]
failovercache_configuration [10, 78, 54, 16, 36, 58, 19, 52, 8, 64, 32, 59, 15, 70, 3, 47]
owner_tag [48, 30, 78, 10, 81, 9, 42, 29, 71, 20, 3, 61, 40, 24, 44, 58]
sco_access_data [74, 4, 63, 46, 34, 83, 16, 53, 72, 19, 15, 81, 40, 71, 23, 36]
snapshots.xml [56, 61, 3, 33, 26, 65, 38, 37, 17, 74, 21, 48, 58, 52, 70, 12]
tlog_09bf0e7b-d395-42f9-82ab-f65fbebfb4e0 [33, 8, 78, 51, 49, 18, 75, 28, 14, 64, 71, 53, 4, 82, 47, 7]
tlog_10a52214-d8bb-4b35-8402-8a016cb579a5 [47, 77, 21, 52, 41, 8, 66, 33, 26, 76, 73, 37, 23, 79, 44, 25]
tlog_13d76fe6-3bfb-4e7a-b92f-6964b40ed0a5 [59, 57, 14, 36, 81, 23, 47, 21, 43, 75, 19, 35, 76, 37, 16, 60]
tlog_42d6244e-c183-4616-82f5-b9e48e117aa1 [69, 28, 66, 15, 20, 42, 71, 43, 25, 76, 46, 78, 24, 75, 8, 38]
tlog_45d1ae6e-3378-4d8d-88e7-9e2c2de04246 [61, 29, 18, 69, 75, 10, 32, 25, 58, 34, 46, 14, 80, 37, 19, 63]
tlog_471f75e4-1de3-48c5-b463-7e2d6565d765 [52, 65, 3, 54, 48, 74, 14, 34, 16, 56, 26, 78, 51, 40, 24, 79]
tlog_4dbcf2d9-26a4-41ef-988e-75fb81e15cc1 [53, 76, 15, 28, 61, 38, 23, 36, 1, 62, 48, 72, 3, 31, 70, 25]
tlog_5102030a-df61-4e1e-b890-9515b5ee1c9f [9, 19, 44, 77, 2, 65, 38, 12, 28, 63, 42, 59, 18, 68, 33, 13]
tlog_67f385fe-07e5-4487-84df-1ac474bdb0d5 [2, 49, 61, 13, 9, 82, 48, 46, 56, 27, 62, 14, 28, 75, 23, 43]
tlog_6afd3911-455a-4787-a4a5-c528dc0da098 [48, 30, 4, 71, 26, 81, 35, 16, 62, 38, 31, 56, 18, 72, 54, 3]
tlog_6b35dc02-2053-47a6-b7e5-6dc6e2be8b53 [44, 61, 48, 25, 3, 33, 56, 36, 13, 70, 19, 68, 47, 2, 40, 69]
tlog_6d97c479-c070-40b0-bf49-d3d9b1f2aa0d [10, 83, 8, 41, 16, 38, 68, 36, 57, 12, 72, 42, 26, 76, 20, 44]
tlog_74f78839-9e46-43f8-ab98-1600f517d119 [73, 51, 64, 16, 67, 36, 14, 24, 82, 34, 38, 59, 19, 47, 13, 79]
tlog_7efd5191-1384-4e07-92ac-b2c22a8c7598 [68, 78, 12, 28, 18, 56, 43, 37, 71, 9, 70, 29, 17, 21, 51, 59]
tlog_855b3e3a-98a3-4fa4-a1c2-ce6b9de03584 [66, 56, 48, 3, 68, 8, 37, 71, 16, 53, 24, 47, 79, 12, 81, 32]
tlog_8e1e6470-6b0f-4a5c-acac-2a95cdea3965 [33, 40, 69, 17, 35, 24, 57, 31, 65, 21, 26, 61, 37, 56, 54, 1]
tlog_9116d392-67eb-40f2-ab0c-2fd3c1935eeb [9, 16, 28, 70, 74, 47, 22, 61, 18, 51, 31, 19, 65, 73, 14, 35]
tlog_990afc51-5dea-42fe-a8e2-307f292af8dc [53, 75, 36, 26, 10, 60, 54, 51, 71, 3, 64, 24, 34, 41, 4, 57]
tlog_99eefca5-d4d0-4ba1-848b-00b73175d5ae [3, 7, 62, 44, 78, 20, 38, 77, 46, 16, 10, 37, 63, 69, 42, 1]
tlog_a0327677-110b-437b-97ef-5e3f099d417a [42, 63, 24, 29, 57, 12, 43, 9, 56, 35, 1, 37, 62, 48, 19, 60]
tlog_a59ad240-d83f-4d98-87d7-21801c91f458 [33, 44, 65, 19, 17, 74, 34, 59, 24, 38, 13, 47, 72, 53, 57, 26]
tlog_a662af77-c356-450a-bf35-329488c164e8 [36, 8, 79, 38, 4, 58, 48, 80, 25, 32, 43, 23, 65, 21, 46, 57]
tlog_a82fc637-61a2-4e38-9694-8a23e5ba6956 [78, 61, 36, 16, 32, 14, 65, 58, 46, 19, 38, 12, 66, 37, 7, 83]
tlog_b1fffa6e-fb7f-44c0-adce-4a48f8a04c9a [82, 76, 35, 23, 34, 1, 68, 4, 71, 47, 37, 67, 19, 36, 58, 16]
tlog_b2052dca-76d9-44e0-b748-a5bbde4bdb7f [62, 10, 65, 38, 80, 1, 46, 53, 15, 78, 17, 58, 42, 24, 49, 59]
tlog_bd8f4450-882c-4681-aff4-0722154ba057 [38, 59, 36, 9, 83, 8, 48, 31, 1, 73, 28, 24, 81, 12, 68, 47]
tlog_cd3c2341-f354-45e1-ad44-95dbcf72c453 [82, 44, 8, 83, 77, 51, 26, 36, 7, 63, 9, 43, 65, 34, 10, 62]
tlog_d4203f53-5731-4e17-b2a1-20e54fb1b361 [60, 40, 21, 76, 33, 59, 24, 46, 79, 16, 32, 1, 70, 23, 36, 67]
tlog_e2e96d0f-c730-4f55-add8-a3f82c5c0efb [65, 48, 9, 81, 64, 53, 14, 60, 18, 33, 8, 83, 51, 52, 67, 21]
tlog_e55bcd93-d306-412f-8c25-f1540fbc6452 [24, 42, 77, 12, 14, 28, 67, 37, 8, 71, 58, 43, 15, 53, 3, 70]
volume_configuration [61, 73, 41, 7, 15, 53, 56, 30, 59, 18, 68, 26, 48, 12, 65, 35]

aggregated:

all {1: 12, 2: 13, 3: 18, 4: 12, 7: 13, 8: 15, 9: 15, 10: 13, 12: 21, 13: 14, 14: 16, 15: 13, 16: 19, 17: 13, 18: 15, 19: 21, 20: 11, 21: 14, 22: 9, 23: 15, 24: 20, 25: 13, 26: 20, 27: 7, 28: 17, 29: 10, 30: 8, 31: 12, 32: 15, 33: 15, 34: 19, 35: 13, 36: 17, 37: 16, 38: 20, 40: 10, 41: 12, 42: 14, 43: 13, 44: 18, 46: 16, 47: 16, 48: 21, 49: 8, 51: 20, 52: 17, 53: 19, 54: 12, 56: 20, 57: 10, 58: 19, 59: 16, 60: 9, 61: 13, 62: 11, 63: 13, 64: 10, 65: 16, 66: 10, 67: 12, 68: 16, 69: 13, 70: 14, 71: 16, 72: 7, 73: 16, 74: 10, 75: 15, 76: 10, 77: 7, 78: 14, 79: 12, 80: 12, 81: 16, 82: 13, 83: 12} 76
firsts {2: 3, 3: 1, 7: 1, 9: 2, 10: 2, 12: 1, 13: 1, 15: 1, 16: 1, 18: 1, 21: 1, 24: 1, 27: 1, 28: 1, 31: 1, 33: 3, 36: 1, 38: 1, 42: 2, 43: 1, 44: 1, 47: 1, 48: 2, 49: 1, 51: 2, 52: 2, 53: 3, 54: 1, 56: 1, 58: 2, 59: 2, 60: 1, 61: 2, 62: 1, 63: 1, 64: 1, 65: 1, 66: 1, 67: 1, 68: 2, 69: 1, 73: 2, 74: 1, 75: 2, 78: 1, 79: 1, 80: 1, 82: 2} 48
min,max, avg: 7, 21, 14.1052631579

df shows all the devices (we have 4 asd's per device here) to be filled between 2 & 3 %.

dejonghb commented 7 years ago

Checking

df -h | awk '/sd[b-h]/ { gsub(/G/, "", $3); sum += $3; print $3 } END { print sum }'

on the 3 nodes gives 46.8 G >< 46.2 G >< 47 G

The GUI says there's 39.12 G >< 15.50 G >< 43.43 G in use....

toolslive commented 7 years ago
node1: 42.65 GiB
node2: 20.21 GiB
node3: 38.49 GiB

are figures collected from the volume driver, not what Alba stored. The second node is slower (for some reason) and writes less. Alba however, distributed the fragments over the available ASDs. For all namespaces, the distribution over the asds falls within avg - (avg/2)< min max<=avg + (avg/2)

dejonghb commented 7 years ago

Timing buffered disk reads:

for d in b c d e f; do hdparm -t /dev/sd$d | awk '/dev/ { printf ("%s ", $1) } /buffered/ { print $(NF-1), $NF }'; done

1ste node:

/dev/sdb: 514.74 MB/sec
/dev/sdc: 514.25 MB/sec
/dev/sdd: 516.83 MB/sec
/dev/sde: 338.95 MB/sec
/dev/sdf: 514.92 MB/sec
[0:0:9:0]    disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdb 
[0:0:10:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdc 
[0:0:11:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdd 
[0:0:12:0]   disk    ATA      INTEL SSDSC2BB30 0370  /dev/sde 
[0:0:14:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdf 
[0:0:15:0]   disk    ATA      Samsung SSD 845D AX3Q  /dev/sdg

2nd node:

/dev/sdb: 517.12 MB/sec
/dev/sdc: 515.22 MB/sec
/dev/sdd: 337.59 MB/sec
/dev/sde: 374.91 MB/sec
/dev/sdf: 514.21 MB/sec
[0:0:9:0]    disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdb 
[0:0:11:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdc 
[0:0:12:0]   disk    ATA      INTEL SSDSC2BB30 0370  /dev/sdd 
[0:0:13:0]   disk    ATA      INTEL SSDSC2BB30 0370  /dev/sde 
[0:0:14:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdf 
[0:0:15:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdg

3rd node:

/dev/sdb: 515.86 MB/sec
/dev/sdc: 515.31 MB/sec
/dev/sdd: 520.35 MB/sec
/dev/sde: 517.22 MB/sec
/dev/sdf: 517.55 MB/sec
[0:0:9:0]    disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdb 
[0:0:10:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdc 
[0:0:11:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdd 
[0:0:12:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sde 
[0:0:13:0]   disk    ATA      SAMSUNG MZ7LM480 003Q  /dev/sdf 
[0:0:14:0]   disk    ATA      INTEL SSDSC2BB30 0370  /dev/sdg 
[0:0:15:0]   disk    ATA      INTEL SSDSC2BB30 0370  /dev/sdh
dejonghb commented 7 years ago

After the xmas weekend, it seems impossible to reproduce this behaviour. The systems were unresponsive and rebooted, but had been rebooted before so the reboot itself is not the fixing factor. Currently clueless about what's been going on last week.

dejonghb commented 7 years ago

More reading/writing to the system eventually reintroduced first one slower node (resulting in less data stored via it) until finally everything got extremely slow (all nodes). Using alba internal benchmarks shows the proxy-bench having trouble (but alba-bench was fine). Further investigation still needed.

pploegaert commented 7 years ago

Create long running test to validate - or investigate on longer running environments pocops/gig: