Open pploegaert opened 7 years ago
fio test 80% read/20% write with runtime=480 & ramp_time=30; 44 vdisks on each storage node; fio from 4 nodes over the edge:
node1: 42.65 GiB node2: 20.21 GiB node3: 38.49 GiB
testrun directly after reboot, so all services were fresh. behaviour already present yesterday, so it does not appear to be transient.
This could be due to the fragment placement bug in alba < 1.3.1: https://github.com/openvstorage/alba/issues/523. Which alba are you using? What policy is in the preset? Looking at the data distribution I would guess k+m=5 in the active policy ... (although the wiki page says it's 8,0 - in which case I would expect the data distribution over the different nodes to be more like 3/3/2 compared to the 2/2/1 you see now). Please verify whether upgrading to alba 1.3.1 fixes the problem.
Already using alba 1.3.1 (aka latest greatest -- system has been rebooted inbetween testruns without changes in the result, so it cannot be a hangover/leftover) Setup is on 10.100.186.32/33/34 if you want to check things. Preset used for storage is (12, 4, 16, 6)
looked at the distribution over the osds for a random namespace
f976ca3e-6775-452c-952c-488d318d6ac7
not a lot there, so I dump it all here:
00_00000001_00 [27, 18, 73, 35, 31, 83, 2, 12, 56, 30, 60, 13, 47, 32, 71, 8]
00_00000002_00 [67, 31, 81, 16, 18, 82, 34, 12, 64, 47, 75, 28, 25, 53, 4, 80]
00_00000003_00 [43, 73, 23, 30, 12, 34, 66, 25, 42, 68, 29, 21, 63, 19, 33, 71]
00_00000004_00 [51, 17, 60, 52, 31, 23, 80, 82, 12, 41, 74, 27, 46, 54, 14, 73]
00_00000005_00 [75, 53, 74, 23, 8, 67, 54, 51, 78, 12, 34, 10, 62, 2, 44, 80]
00_00000006_00 [54, 31, 24, 60, 80, 7, 28, 42, 26, 67, 20, 52, 59, 41, 4, 73]
00_00000007_00 [73, 41, 82, 4, 63, 33, 7, 35, 13, 78, 51, 19, 75, 70, 24, 48]
00_00000008_00 [18, 29, 1, 81, 56, 49, 13, 35, 80, 25, 69, 38, 19, 82, 46, 21]
00_00000009_00 [12, 83, 46, 2, 67, 47, 26, 81, 32, 8, 44, 76, 18, 35, 7, 70]
00_0000000a_00 [52, 33, 77, 4, 20, 79, 54, 62, 14, 44, 29, 56, 25, 13, 51, 83]
00_0000000b_00 [49, 29, 65, 14, 26, 73, 32, 51, 1, 83, 28, 22, 58, 21, 52, 57]
00_0000000c_00 [28, 79, 9, 41, 10, 69, 52, 23, 36, 56, 3, 75, 44, 68, 22, 32]
00_0000000d_00 [7, 54, 19, 83, 16, 52, 71, 26, 82, 48, 53, 66, 13, 44, 73, 18]
00_0000000e_00 [21, 14, 32, 68, 63, 23, 41, 42, 20, 65, 28, 22, 58, 74, 43, 27]
00_0000000f_00 [64, 66, 38, 1, 40, 15, 73, 16, 79, 28, 51, 2, 68, 49, 18, 70]
00_00000010_00 [68, 34, 69, 22, 49, 14, 63, 19, 48, 56, 3, 81, 31, 24, 44, 64]
00_00000011_00 [58, 48, 80, 24, 66, 17, 37, 38, 78, 26, 34, 25, 69, 3, 46, 67]
00_00000012_00 [58, 21, 59, 38, 3, 51, 74, 40, 12, 71, 64, 44, 24, 42, 2, 70]
00_00000013_00 [16, 46, 25, 81, 73, 51, 20, 53, 4, 77, 2, 28, 75, 37, 24, 64]
00_00000014_00 [31, 15, 38, 71, 2, 69, 32, 81, 19, 33, 47, 62, 9, 21, 43, 80]
00_00000015_00 [13, 58, 34, 15, 23, 37, 56, 25, 81, 33, 65, 9, 44, 3, 53, 67]
00_00000016_00 [2, 18, 58, 41, 56, 51, 23, 70, 53, 17, 7, 81, 33, 46, 20, 72]
00_00000017_00 [63, 48, 10, 82, 26, 37, 58, 76, 43, 17, 52, 78, 2, 7, 59, 44]
00_00000018_00 [75, 41, 74, 27, 19, 62, 40, 70, 32, 12, 48, 66, 20, 57, 53, 1]
00_00000019_00 [51, 34, 69, 10, 32, 61, 20, 59, 52, 19, 3, 46, 63, 9, 79, 54]
00_0000001a_00 [79, 15, 34, 56, 26, 69, 52, 60, 44, 17, 53, 22, 66, 27, 48, 58]
00_0000001b_00 [59, 58, 29, 16, 57, 22, 38, 68, 4, 34, 54, 27, 73, 69, 13, 43]
00_0000001c_00 [2, 48, 68, 9, 26, 31, 75, 83, 17, 52, 3, 38, 65, 82, 36, 24]
00_0000001d_00 [42, 56, 28, 14, 26, 30, 59, 75, 12, 29, 51, 79, 13, 15, 47, 78]
00_0000001e_00 [15, 34, 17, 61, 19, 68, 30, 22, 35, 81, 49, 76, 13, 12, 52, 75]
00_0000001f_00 [80, 10, 56, 36, 52, 73, 7, 35, 22, 67, 30, 82, 17, 61, 41, 8]
00_00000020_00 [53, 63, 32, 26, 12, 72, 40, 9, 80, 51, 16, 34, 71, 42, 7, 58]
failovercache_configuration [10, 78, 54, 16, 36, 58, 19, 52, 8, 64, 32, 59, 15, 70, 3, 47]
owner_tag [48, 30, 78, 10, 81, 9, 42, 29, 71, 20, 3, 61, 40, 24, 44, 58]
sco_access_data [74, 4, 63, 46, 34, 83, 16, 53, 72, 19, 15, 81, 40, 71, 23, 36]
snapshots.xml [56, 61, 3, 33, 26, 65, 38, 37, 17, 74, 21, 48, 58, 52, 70, 12]
tlog_09bf0e7b-d395-42f9-82ab-f65fbebfb4e0 [33, 8, 78, 51, 49, 18, 75, 28, 14, 64, 71, 53, 4, 82, 47, 7]
tlog_10a52214-d8bb-4b35-8402-8a016cb579a5 [47, 77, 21, 52, 41, 8, 66, 33, 26, 76, 73, 37, 23, 79, 44, 25]
tlog_13d76fe6-3bfb-4e7a-b92f-6964b40ed0a5 [59, 57, 14, 36, 81, 23, 47, 21, 43, 75, 19, 35, 76, 37, 16, 60]
tlog_42d6244e-c183-4616-82f5-b9e48e117aa1 [69, 28, 66, 15, 20, 42, 71, 43, 25, 76, 46, 78, 24, 75, 8, 38]
tlog_45d1ae6e-3378-4d8d-88e7-9e2c2de04246 [61, 29, 18, 69, 75, 10, 32, 25, 58, 34, 46, 14, 80, 37, 19, 63]
tlog_471f75e4-1de3-48c5-b463-7e2d6565d765 [52, 65, 3, 54, 48, 74, 14, 34, 16, 56, 26, 78, 51, 40, 24, 79]
tlog_4dbcf2d9-26a4-41ef-988e-75fb81e15cc1 [53, 76, 15, 28, 61, 38, 23, 36, 1, 62, 48, 72, 3, 31, 70, 25]
tlog_5102030a-df61-4e1e-b890-9515b5ee1c9f [9, 19, 44, 77, 2, 65, 38, 12, 28, 63, 42, 59, 18, 68, 33, 13]
tlog_67f385fe-07e5-4487-84df-1ac474bdb0d5 [2, 49, 61, 13, 9, 82, 48, 46, 56, 27, 62, 14, 28, 75, 23, 43]
tlog_6afd3911-455a-4787-a4a5-c528dc0da098 [48, 30, 4, 71, 26, 81, 35, 16, 62, 38, 31, 56, 18, 72, 54, 3]
tlog_6b35dc02-2053-47a6-b7e5-6dc6e2be8b53 [44, 61, 48, 25, 3, 33, 56, 36, 13, 70, 19, 68, 47, 2, 40, 69]
tlog_6d97c479-c070-40b0-bf49-d3d9b1f2aa0d [10, 83, 8, 41, 16, 38, 68, 36, 57, 12, 72, 42, 26, 76, 20, 44]
tlog_74f78839-9e46-43f8-ab98-1600f517d119 [73, 51, 64, 16, 67, 36, 14, 24, 82, 34, 38, 59, 19, 47, 13, 79]
tlog_7efd5191-1384-4e07-92ac-b2c22a8c7598 [68, 78, 12, 28, 18, 56, 43, 37, 71, 9, 70, 29, 17, 21, 51, 59]
tlog_855b3e3a-98a3-4fa4-a1c2-ce6b9de03584 [66, 56, 48, 3, 68, 8, 37, 71, 16, 53, 24, 47, 79, 12, 81, 32]
tlog_8e1e6470-6b0f-4a5c-acac-2a95cdea3965 [33, 40, 69, 17, 35, 24, 57, 31, 65, 21, 26, 61, 37, 56, 54, 1]
tlog_9116d392-67eb-40f2-ab0c-2fd3c1935eeb [9, 16, 28, 70, 74, 47, 22, 61, 18, 51, 31, 19, 65, 73, 14, 35]
tlog_990afc51-5dea-42fe-a8e2-307f292af8dc [53, 75, 36, 26, 10, 60, 54, 51, 71, 3, 64, 24, 34, 41, 4, 57]
tlog_99eefca5-d4d0-4ba1-848b-00b73175d5ae [3, 7, 62, 44, 78, 20, 38, 77, 46, 16, 10, 37, 63, 69, 42, 1]
tlog_a0327677-110b-437b-97ef-5e3f099d417a [42, 63, 24, 29, 57, 12, 43, 9, 56, 35, 1, 37, 62, 48, 19, 60]
tlog_a59ad240-d83f-4d98-87d7-21801c91f458 [33, 44, 65, 19, 17, 74, 34, 59, 24, 38, 13, 47, 72, 53, 57, 26]
tlog_a662af77-c356-450a-bf35-329488c164e8 [36, 8, 79, 38, 4, 58, 48, 80, 25, 32, 43, 23, 65, 21, 46, 57]
tlog_a82fc637-61a2-4e38-9694-8a23e5ba6956 [78, 61, 36, 16, 32, 14, 65, 58, 46, 19, 38, 12, 66, 37, 7, 83]
tlog_b1fffa6e-fb7f-44c0-adce-4a48f8a04c9a [82, 76, 35, 23, 34, 1, 68, 4, 71, 47, 37, 67, 19, 36, 58, 16]
tlog_b2052dca-76d9-44e0-b748-a5bbde4bdb7f [62, 10, 65, 38, 80, 1, 46, 53, 15, 78, 17, 58, 42, 24, 49, 59]
tlog_bd8f4450-882c-4681-aff4-0722154ba057 [38, 59, 36, 9, 83, 8, 48, 31, 1, 73, 28, 24, 81, 12, 68, 47]
tlog_cd3c2341-f354-45e1-ad44-95dbcf72c453 [82, 44, 8, 83, 77, 51, 26, 36, 7, 63, 9, 43, 65, 34, 10, 62]
tlog_d4203f53-5731-4e17-b2a1-20e54fb1b361 [60, 40, 21, 76, 33, 59, 24, 46, 79, 16, 32, 1, 70, 23, 36, 67]
tlog_e2e96d0f-c730-4f55-add8-a3f82c5c0efb [65, 48, 9, 81, 64, 53, 14, 60, 18, 33, 8, 83, 51, 52, 67, 21]
tlog_e55bcd93-d306-412f-8c25-f1540fbc6452 [24, 42, 77, 12, 14, 28, 67, 37, 8, 71, 58, 43, 15, 53, 3, 70]
volume_configuration [61, 73, 41, 7, 15, 53, 56, 30, 59, 18, 68, 26, 48, 12, 65, 35]
aggregated:
all {1: 12, 2: 13, 3: 18, 4: 12, 7: 13, 8: 15, 9: 15, 10: 13, 12: 21, 13: 14, 14: 16, 15: 13, 16: 19, 17: 13, 18: 15, 19: 21, 20: 11, 21: 14, 22: 9, 23: 15, 24: 20, 25: 13, 26: 20, 27: 7, 28: 17, 29: 10, 30: 8, 31: 12, 32: 15, 33: 15, 34: 19, 35: 13, 36: 17, 37: 16, 38: 20, 40: 10, 41: 12, 42: 14, 43: 13, 44: 18, 46: 16, 47: 16, 48: 21, 49: 8, 51: 20, 52: 17, 53: 19, 54: 12, 56: 20, 57: 10, 58: 19, 59: 16, 60: 9, 61: 13, 62: 11, 63: 13, 64: 10, 65: 16, 66: 10, 67: 12, 68: 16, 69: 13, 70: 14, 71: 16, 72: 7, 73: 16, 74: 10, 75: 15, 76: 10, 77: 7, 78: 14, 79: 12, 80: 12, 81: 16, 82: 13, 83: 12} 76
firsts {2: 3, 3: 1, 7: 1, 9: 2, 10: 2, 12: 1, 13: 1, 15: 1, 16: 1, 18: 1, 21: 1, 24: 1, 27: 1, 28: 1, 31: 1, 33: 3, 36: 1, 38: 1, 42: 2, 43: 1, 44: 1, 47: 1, 48: 2, 49: 1, 51: 2, 52: 2, 53: 3, 54: 1, 56: 1, 58: 2, 59: 2, 60: 1, 61: 2, 62: 1, 63: 1, 64: 1, 65: 1, 66: 1, 67: 1, 68: 2, 69: 1, 73: 2, 74: 1, 75: 2, 78: 1, 79: 1, 80: 1, 82: 2} 48
min,max, avg: 7, 21, 14.1052631579
df shows all the devices (we have 4 asd's per device here) to be filled between 2 & 3 %.
Checking
df -h | awk '/sd[b-h]/ { gsub(/G/, "", $3); sum += $3; print $3 } END { print sum }'
on the 3 nodes gives 46.8 G >< 46.2 G >< 47 G
The GUI says there's 39.12 G >< 15.50 G >< 43.43 G in use....
node1: 42.65 GiB
node2: 20.21 GiB
node3: 38.49 GiB
are figures collected from the volume driver, not what Alba stored. The second node is slower (for some reason) and writes less. Alba however, distributed the fragments over the available ASDs.
For all namespaces, the distribution over the asds falls within avg - (avg/2)< min max<=avg + (avg/2)
Timing buffered disk reads:
for d in b c d e f; do hdparm -t /dev/sd$d | awk '/dev/ { printf ("%s ", $1) } /buffered/ { print $(NF-1), $NF }'; done
1ste node:
/dev/sdb: 514.74 MB/sec
/dev/sdc: 514.25 MB/sec
/dev/sdd: 516.83 MB/sec
/dev/sde: 338.95 MB/sec
/dev/sdf: 514.92 MB/sec
[0:0:9:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdb
[0:0:10:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdc
[0:0:11:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdd
[0:0:12:0] disk ATA INTEL SSDSC2BB30 0370 /dev/sde
[0:0:14:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdf
[0:0:15:0] disk ATA Samsung SSD 845D AX3Q /dev/sdg
2nd node:
/dev/sdb: 517.12 MB/sec
/dev/sdc: 515.22 MB/sec
/dev/sdd: 337.59 MB/sec
/dev/sde: 374.91 MB/sec
/dev/sdf: 514.21 MB/sec
[0:0:9:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdb
[0:0:11:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdc
[0:0:12:0] disk ATA INTEL SSDSC2BB30 0370 /dev/sdd
[0:0:13:0] disk ATA INTEL SSDSC2BB30 0370 /dev/sde
[0:0:14:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdf
[0:0:15:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdg
3rd node:
/dev/sdb: 515.86 MB/sec
/dev/sdc: 515.31 MB/sec
/dev/sdd: 520.35 MB/sec
/dev/sde: 517.22 MB/sec
/dev/sdf: 517.55 MB/sec
[0:0:9:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdb
[0:0:10:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdc
[0:0:11:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdd
[0:0:12:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sde
[0:0:13:0] disk ATA SAMSUNG MZ7LM480 003Q /dev/sdf
[0:0:14:0] disk ATA INTEL SSDSC2BB30 0370 /dev/sdg
[0:0:15:0] disk ATA INTEL SSDSC2BB30 0370 /dev/sdh
After the xmas weekend, it seems impossible to reproduce this behaviour. The systems were unresponsive and rebooted, but had been rebooted before so the reboot itself is not the fixing factor. Currently clueless about what's been going on last week.
More reading/writing to the system eventually reintroduced first one slower node (resulting in less data stored via it) until finally everything got extremely slow (all nodes). Using alba internal benchmarks shows the proxy-bench having trouble (but alba-bench was fine). Further investigation still needed.
Create long running test to validate - or investigate on longer running environments pocops/gig:
As observed on Fattwin environment http://confluence.cloudfounders.com/pages/viewpage.action?pageId=66617797
Credits to @dejonghb
1 / 3 nodes has a lot less data stored at the end of the test. Node 1 and 3 17 GiB Node 2 has only 11 Gib
System parameters show difference in
To be investigated further ...
@dejonghb : pls log your findings in this ticket