Open bowenwang1996 opened 2 years ago
@olonho do you want to investigate this yourself?
I'm looking on that, but second pair of eyes wouldn't hurt.
Curious stats from about a day of playback:
Processed 10000 blocks, 9 blocks per second, 4357195 secs remaining
Processed 20000 blocks, 10 blocks per second, 3920475 secs remaining
Processed 30000 blocks, 12 blocks per second, 3266229 secs remaining
Processed 40000 blocks, 12 blocks per second, 3265396 secs remaining
Processed 50000 blocks, 13 blocks per second, 3013442 secs remaining
Processed 60000 blocks, 9 blocks per second, 4351639 secs remaining
Processed 70000 blocks, 9 blocks per second, 4350528 secs remaining
Processed 80000 blocks, 10 blocks per second, 3914475 secs remaining
Processed 90000 blocks, 9 blocks per second, 4348306 secs remaining
Processed 100000 blocks, 9 blocks per second, 4347195 secs remaining
Processed 110000 blocks, 9 blocks per second, 4346084 secs remaining
Processed 120000 blocks, 9 blocks per second, 4344973 secs remaining
Processed 130000 blocks, 9 blocks per second, 4343861 secs remaining
Processed 140000 blocks, 11 blocks per second, 3553159 secs remaining
Processed 150000 blocks, 9 blocks per second, 4341639 secs remaining
Processed 160000 blocks, 11 blocks per second, 3551341 secs remaining
Processed 170000 blocks, 10 blocks per second, 3905475 secs remaining
Processed 180000 blocks, 10 blocks per second, 3904475 secs remaining
Processed 190000 blocks, 10 blocks per second, 3903475 secs remaining
Processed 200000 blocks, 10 blocks per second, 3902475 secs remaining
Processed 210000 blocks, 11 blocks per second, 3546796 secs remaining
Processed 220000 blocks, 10 blocks per second, 3900475 secs remaining
Processed 230000 blocks, 10 blocks per second, 3899475 secs remaining
Processed 240000 blocks, 8 blocks per second, 4873094 secs remaining
Processed 250000 blocks, 9 blocks per second, 4330528 secs remaining
Processed 260000 blocks, 13 blocks per second, 2997289 secs remaining
Processed 270000 blocks, 23 blocks per second, 1693685 secs remaining
Processed 280000 blocks, 34 blocks per second, 1145434 secs remaining
Processed 290000 blocks, 32 blocks per second, 1216711 secs remaining
Processed 300000 blocks, 15 blocks per second, 2594983 secs remaining
Processed 310000 blocks, 10 blocks per second, 3891475 secs remaining
Processed 320000 blocks, 9 blocks per second, 4322750 secs remaining
Processed 330000 blocks, 8 blocks per second, 4861844 secs remaining
Processed 340000 blocks, 7 blocks per second, 5554965 secs remaining
Processed 350000 blocks, 8 blocks per second, 4859344 secs remaining
Processed 360000 blocks, 8 blocks per second, 4858094 secs remaining
Processed 370000 blocks, 7 blocks per second, 5550679 secs remaining
Processed 380000 blocks, 11 blocks per second, 3531341 secs remaining
Processed 390000 blocks, 29 blocks per second, 1339129 secs remaining
Processed 400000 blocks, 30 blocks per second, 1294158 secs remaining
Processed 410000 blocks, 30 blocks per second, 1293825 secs remaining
Processed 420000 blocks, 28 blocks per second, 1385884 secs remaining
Processed 430000 blocks, 20 blocks per second, 1939737 secs remaining
Processed 440000 blocks, 9 blocks per second, 4309417 secs remaining
Processed 450000 blocks, 10 blocks per second, 3877475 secs remaining
Processed 460000 blocks, 9 blocks per second, 4307195 secs remaining
Processed 470000 blocks, 8 blocks per second, 4844344 secs remaining
Processed 480000 blocks, 9 blocks per second, 4304973 secs remaining
Processed 490000 blocks, 9 blocks per second, 4303861 secs remaining
Processed 500000 blocks, 10 blocks per second, 3872475 secs remaining
Processed 510000 blocks, 9 blocks per second, 4301639 secs remaining
Processed 520000 blocks, 10 blocks per second, 3870475 secs remaining
Processed 530000 blocks, 10 blocks per second, 3869475 secs remaining
Processed 540000 blocks, 11 blocks per second, 3516796 secs remaining
Processed 550000 blocks, 10 blocks per second, 3867475 secs remaining
Processed 560000 blocks, 10 blocks per second, 3866475 secs remaining
Processed 570000 blocks, 10 blocks per second, 3865475 secs remaining
Processed 580000 blocks, 10 blocks per second, 3864475 secs remaining
Processed 590000 blocks, 10 blocks per second, 3863475 secs remaining
Processed 600000 blocks, 10 blocks per second, 3862475 secs remaining
Processed 610000 blocks, 10 blocks per second, 3861475 secs remaining
Processed 620000 blocks, 10 blocks per second, 3860475 secs remaining
Processed 630000 blocks, 9 blocks per second, 4288306 secs remaining
Processed 640000 blocks, 12 blocks per second, 3215396 secs remaining
Processed 650000 blocks, 12 blocks per second, 3214563 secs remaining
Processed 660000 blocks, 10 blocks per second, 3856475 secs remaining
Processed 670000 blocks, 9 blocks per second, 4283861 secs remaining
Processed 680000 blocks, 13 blocks per second, 2964981 secs remaining
Processed 690000 blocks, 10 blocks per second, 3853475 secs remaining
Processed 700000 blocks, 10 blocks per second, 3852475 secs remaining
Processed 710000 blocks, 10 blocks per second, 3851475 secs remaining
Processed 720000 blocks, 10 blocks per second, 3850475 secs remaining
Processed 730000 blocks, 11 blocks per second, 3499523 secs remaining
Processed 740000 blocks, 11 blocks per second, 3498614 secs remaining
Processed 750000 blocks, 11 blocks per second, 3497705 secs remaining
Processed 760000 blocks, 12 blocks per second, 3205396 secs remaining
Processed 770000 blocks, 10 blocks per second, 3845475 secs remaining
Processed 780000 blocks, 10 blocks per second, 3844475 secs remaining
Processed 790000 blocks, 10 blocks per second, 3843475 secs remaining
Processed 800000 blocks, 12 blocks per second, 3202063 secs remaining
Processed 810000 blocks, 11 blocks per second, 3492250 secs remaining
Processed 820000 blocks, 10 blocks per second, 3840475 secs remaining
Processed 830000 blocks, 11 blocks per second, 3490432 secs remaining
Processed 840000 blocks, 11 blocks per second, 3489523 secs remaining
Processed 850000 blocks, 11 blocks per second, 3488614 secs remaining
Processed 860000 blocks, 10 blocks per second, 3836475 secs remaining
Processed 870000 blocks, 10 blocks per second, 3835475 secs remaining
Processed 880000 blocks, 11 blocks per second, 3485887 secs remaining
Processed 890000 blocks, 11 blocks per second, 3484977 secs remaining
Processed 900000 blocks, 11 blocks per second, 3484068 secs remaining
Processed 910000 blocks, 10 blocks per second, 3831475 secs remaining
Processed 920000 blocks, 10 blocks per second, 3830475 secs remaining
So we see that playback speed is not uniform and sometimes reaches 305 blocks per second.
Pure neard top looks like this:
[.] rocksdb::crc32c::crc32c_3way ◆
[.] snappy::RawUncompress ▒
[.] std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std▒
[.] sha2::sha256::compress256 ▒
[.] rocksdb::IndexBlockIter::SeekImpl ▒
[.] rocksdb::ReadProperties ▒
[.] rocksdb::LRUHandleTable::FindPointer ▒
[.] rocksdb::LRUCacheShard::LRU_Remove ▒
[.] std::_Sp_counted_ptr<rocksdb::TableProperties*, (__gnu_cxx::_Lock_policy)2>::_M_dispose ▒
[.] rocksdb::BlockBasedTable::~BlockBasedTable ▒
[.] rocksdb::BlockFetcher::ReadBlockContents ▒
[.] _rjem_sdallocx ▒
[.] <FN as wasmer_runtime_core_near::typed_func::HostFunction<wasmer_runtime_core_near::typed_func::ExplicitVmCt▒
[.] rocksdb::LRUCacheShard::LRU_Insert ▒
[.] _rjem_malloc ▒
[.] std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std▒
[.] rocksdb::Version::Get ▒
[.] rocksdb::ShardedCache::Lookup ▒
[.] rocksdb::TableCache::Get ▒
[.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<s▒
[.] rocksdb::BlockBasedTable::Get ▒
[.] bs58::encode::encode_into ▒
[.] std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std▒
[.] rocksdb::BlockBasedFilterBlockReader::ParseFieldsFromBlock ▒
[.] rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::Block> ▒
[.] rocksdb::DataBlockIter::SeekImpl ▒
[.] std::__lower_bound<rocksdb::FdWithKeyRange*, rocksdb::Slice, __gnu_cxx::__ops::_Iter_comp_val<rocksdb::(anon▒
[.] <alloc::vec::Vec<T,A> as core::clone::Clone>::clone ▒
[.] rocksdb::BlockBasedFilterBlockReader::MayMatch
Kernel top looks like this
32.39% [k] copy_user_enhanced_fast_string ◆
3.08% [k] find_get_pages_contig ▒
2.66% [k] __lock_text_start ▒
2.51% [k] generic_file_buffered_read ▒
1.01% [k] finish_task_switch ▒
0.92% [k] __softirqentry_text_start ▒
0.61% [k] iowrite16 ▒
0.47% [k] copy_page_to_iter ▒
0.34% [k] xas_load ▒
0.34% [k] syscall_enter_from_user_mode ▒
0.26% [k] __add_to_page_cache_locked ▒
0.25% [k] copyout ▒
0.23% [k] mark_page_accessed ▒
0.21% [k] kmem_cache_alloc ▒
0.21% [k] kmem_cache_free
copy_user_enhanced_fast_string
looks fishy, as it's just this guy https://elixir.bootlin.com/linux/v4.2/source/arch/x86/lib/copy_user_64.S#L216
and not so many system calls ever pass arguments as zero-terminated strings at all.
strace -fp <neard-pid>
shows
[pid 371451] readahead(292, 67685523, 965 <unfinished ...>
[pid 371453] fstat(387, <unfinished ...>
[pid 371451] <... readahead resumed>) = 0
[pid 371453] <... fstat resumed>{st_mode=S_IFREG|0644, st_size=48871308, ...}) = 0
[pid 371451] pread64(292, <unfinished ...>
[pid 371453] ioctl(387, FS_IOC_GETVERSION <unfinished ...>
[pid 371451] <... pread64 resumed>"\1\314\237\243 R\303\270\237 \315\340\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 53, 67686435) = 53
[pid 371453] <... ioctl resumed>, 0x7f38203f7cb8) = 0
[pid 371451] fstat(292, <unfinished ...>
[pid 371453] pread64(387, <unfinished ...>
[pid 371451] <... fstat resumed>{st_mode=S_IFREG|0644, st_size=67686488, ...}) = 0
[pid 371453] <... pread64 resumed>"\0!\7filter.rocksdb.BuiltinBloomFi"..., 87, 48871168) = 87
[pid 371447] <... pread64 resumed>"\273~\230\0(\332\1ZST\343\211'{\10\235B\231\177\177\234A\224\22:\203r\374UJ\v\320"..., 13028, 50678818) = 13028
[pid 371453] pread64(387, <unfinished ...>
[pid 371451] ioctl(292, FS_IOC_GETVERSION <unfinished ...>
[pid 371453] <... pread64 resumed>"\0$\4rocksdb.block.based.table.ind"..., 814, 48870354) = 814
[pid 371451] <... ioctl resumed>, 0x7f3821870ac8) = 0
[pid 371447] futex(0x7f3824c4119c, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 371451] pread64(292, <unfinished ...>
[pid 371452] <... futex resumed>) = 0
[pid 371451] <... pread64 resumed>"\0!\7filter.rocksdb.BuiltinBloomFi"..., 87, 67686348) = 87
[pid 371447] <... futex resumed>) = 1
[pid 371453] close(307 <unfinished ...>
[pid 371452] pread64(145, <unfinished ...>
[pid 371451] pread64(292, <unfinished ...>
[pid 371447] futex(0x7f3824c4119c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 371453] <... close resumed>) = 0
[pid 371451] <... pread64 resumed>"\0$\4rocksdb.block.based.table.ind"..., 823, 67685525) = 823
[pid 371453] openat(AT_FDCWD, "/home/ubuntu/.near/data/3124467.sst", O_RDONLY|O_CLOEXEC) = 307
[pid 371453] fcntl(307, F_GETFD <unfinished ...>
[pid 371451] pread64(292, <unfinished ...>
[pid 371453] <... fcntl resumed>) = 0x1 (flags FD_CLOEXEC)
[pid 371453] fcntl(307, F_SETFD, FD_CLOEXEC) = 0
[pid 371452] <... pread64 resumed>"\320\335\265\3532\242z\310\340\241\322\v\211\7\26h\276\33-H-\4j\2321!\200\264Y\206\202\315"..., 624243, 67114489) = 624243
[pid 371451] <... pread64 resumed>"\301\215\6\10\0\r\0\r\1$\204\233\256G\250\0\276\177\0\f\r\21\1\22\f\336\303\177\2512\22\0"..., 61522, 67624003) = 61522
[pid 371453] fadvise64(307, 0, 0, POSIX_FADV_RANDOM) = 0
[pid 371453] readahead(307, 67326171, 955) = 0
[pid 371452] pread64(303, <unfinished ...>
[pid 371453] pread64(307, <unfinished ...>
[pid 371451] pread64(292, <unfinished ...>
[pid 371452] <... pread64 resumed>"\0\3o\0342\0\206_\0\3o\35F\213_\320e\0\4o\0364m\340\304\1\304c\0\3o\37"..., 80904, 67663977) = 80904
[pid 371453] <... pread64 resumed>"\1\212\250\215 R\235\312\213 \271\327\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 53, 67327073) = 53
[pid 371453] fstat(307, <unfinished ...>
[pid 371452] pread64(303, <unfinished ...>
[pid 371453] <... fstat resumed>{st_mode=S_IFREG|0644, st_size=67327126, ...}) = 0
[pid 371453] ioctl(307, FS_IOC_GETVERSION <unfinished ...>
[pid 371451] <... pread64 resumed>"\206Cg\347\232V\317!\nuQ\241\2032\30,E6\366\350\320\10xb#\32\213\215\226\244v4"..., 499904, 67124099) = 499904
[pid 371453] <... ioctl resumed>, 0x7f38203f7cb8) = 0
[pid 371453] pread64(307, <unfinished ...>
[pid 371452] <... pread64 resumed>"\25\207t\251\366\264m\276\240\215!\225\345\207rnV\"\350\202\241\364r\270@\372\206A\30,@\177"..., 553720, 67110257) = 553720
[pid 371453] <... pread64 resumed>"\0!\7filter.rocksdb.BuiltinBloomFi"..., 87, 67326986) = 87
[pid 371451] close(296) = 0
[pid 371452] openat(AT_FDCWD, "/home/ubuntu/.near/data/3168026.sst", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 371451] pread64(292, <unfinished ...>
[pid 371453] pread64(307, <unfinished ...>
[pid 371452] <... openat resumed>) = 296
[pid 371453] <... pread64 resumed>"\0$\4rocksdb.block.based.table.ind"..., 815, 67326171) = 815
[pid 371452] fcntl(296, F_GETFD) = 0x1 (flags FD_CLOEXEC)
[pid 371452] fcntl(296, F_SETFD, FD_CLOEXEC <unfinished ...>
[pid 371453] pread64(307, <unfinished ...>
[pid 371452] <... fcntl resumed>) = 0
[pid 371453] <... pread64 resumed>"\0\4\202\230+<\0\315F\0\4\202\230>Y\322F\325E\0\4\202\230F\v\254\214\1\267\363\4\0"..., 27582, 67298589) = 27582
[pid 371452] fadvise64(296, 0, 0, POSIX_FADV_RANDOM <unfinished ...>
[pid 371453] pread64(307, <unfinished ...>
[pid 371452] <... fadvise64 resumed>) = 0
[pid 371452] readahead(296, 67208543, 524288 <unfinished ...>
[pid 371453] <... pread64 resumed>"A`SU\37\2523\177=.\345.\264\363\223\31\336\5\"s\323\202\210\375\267j#\320#F'\331"..., 182477, 67116112) = 182477
[pid 371452] <... readahead resumed>) = 0
[pid 371453] close(367 <unfinished ...>
[pid 371452] pread64(296, <unfinished ...>
[pid 371453] <... close resumed>) = 0
[pid 371452] <... pread64 resumed>"\1\323\211\246 R\352\226\241 \263\354\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 53, 67732778) = 53
[pid 371453] pread64(307, <unfinished ...>
[pid 371452] fstat(296, {st_mode=S_IFREG|0644, st_size=67732831, ...}) = 0
[pid 371452] ioctl(296, FS_IOC_GETVERSION, 0x7f38207f83c8) = 0
[pid 371452] pread64(296, "\0!\7filter.rocksdb.BuiltinBloomFi"..., 87, 67732691) = 87
[pid 371452] pread64(296, "\0$\4rocksdb.block.based.table.ind"..., 817, 67731874) = 817
[pid 371452] pread64(296, "\0\3p\300\315\0\307e\0\4p\300\347#\314e\366d\0\4p\301\6z\307\312\1\275g\0\3p"..., 79416, 67652458) = 79416
[pid 371452] pread64(296, <unfinished ...>
[pid 371451] <... pread64 resumed>"\0000\363\3\0\0\0\0\0\0\0\0\204\237\276b\371\315\274\306\233\3354e\373\354:\357\314\2333\242"..., 16033, 31419416) = 16033
[pid 371451] futex(0x7f3824c09540, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 371450] <... futex resumed>) = 0
[pid 371452] <... pread64 resumed>"\2126\261\223\7\321\305\266*\6l8\367I%\21\201\315(\16\220\340\222\241A\305 \225:b\4\246"..., 537481, 67114977) = 537481
[pid 371450] futex(0x7f3824c09540, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 371451] openat(AT_FDCWD, "/home/ubuntu/.near/data/3222880.sst", O_RDONLY|O_CLOEXEC) = 367
[pid 371451] fcntl(367, F_GETFD <unfinished ...>
[pid 371452] close(170 <unfinished ...>
[pid 371451] <... fcntl resumed>) = 0x1 (flags FD_CLOEXEC)
[pid 371452] <... close resumed>) = 0
[pid 371451] fcntl(367, F_SETFD, FD_CLOEXEC <unfinished ...>
[pid 371452] pread64(296, <unfinished ...>
[pid 371451] <... fcntl resumed>) = 0
[pid 371451] fadvise64(367, 0, 0, POSIX_FADV_RANDOM) = 0
[pid 371451] readahead(367, 67665430, 965 <unfinished ...>
[pid 371452] <... pread64 resumed>"\2126\261\223\7\321\305\266*\6l8\367I%\21\201\315(\16\220\340\222\241A\305 \225:b\4\246"..., 537481, 67114977) = 537481
[pid 371451] <... readahead resumed>) = 0
[pid 371453] <... pread64 resumed>"\347\336\2\230\0000\371\20\202\234\351\325\271\353\30\311\240\2705\231\305\275WZ\272\365\266\312@H\371u"..., 36391, 4628635) = 36391
[pid 371451] pread64(367, <unfinished ...>
[pid 371452] pread64(296, <unfinished ...>
[pid 371451] <... pread64 resumed>"\1\317\202\242 R\317\233\236 \302\340\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 53, 67666342) = 53
[pid 371451] fstat(367, {st_mode=S_IFREG|0644, st_size=67666395, ...}) = 0
[pid 371453] futex(0x7f3824c4119c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 371451] ioctl(367, FS_IOC_GETVERSION, 0x7f3821870fe8) = 0
[pid 371451] pread64(367, "\0!\7filter.rocksdb.BuiltinBloomFi"..., 87, 67666255) = 87
[pid 371451] pread64(367, "\0$\4rocksdb.block.based.table.ind"..., 825, 67665430) = 825
[pid 371451] close(328) = 0
[pid 371451] openat(AT_FDCWD, "/home/ubuntu/.near/data/3217771.sst", O_RDONLY|O_CLOEXEC) = 170
[pid 371451] fcntl(170, F_GETFD) = 0x1 (flags FD_CLOEXEC)
[pid 371451] fcntl(170, F_SETFD, FD_CLOEXEC) = 0
[pid 371451] fadvise64(170, 0, 0, POSIX_FADV_RANDOM) = 0
[pid 371451] readahead(170, 67673586, 965) = 0
[pid 371451] pread64(170, "\1\253\302\242 R\233\327\236 \322\344\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 53, 67674498) = 53
[pid 371452] <... pread64 resumed>"\363\177\230\0(\332\1rW<\27-\217\262\277\257\313X\365\31\374\307\4*\206\4\301\375\376\240\243\4"..., 13312, 50099432) = 13312
[pid 371451] fstat(170, {st_mode=S_IFREG|0644, st_size=67674551, ...}) = 0
[pid 371451] ioctl(170, FS_IOC_GETVERSION <unfinished ...>
[pid 371452] futex(0x7f3824c4119c, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 371451] <... ioctl resumed>, 0x7f3821870fe8) = 0
[pid 371451] pread64(170, <unfinished ...>
[pid 371452] <... futex resumed>) = 1
[pid 371447] <... futex resumed>) = 0
[pid 371451] <... pread64 resumed>"\0!\7filter.rocksdb.BuiltinBloomFi"..., 87, 67674411) = 87
[pid 371451] pread64(170, <unfinished ...>
[pid 371447] futex(0x7f3824c4119c, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 371451] <... pread64 resumed>"\0$\4rocksdb.block.based.table.ind"..., 825, 67673586) = 825
[pid 371452] pread64(428, <unfinished ...>
[pid 371451] close(295 <unfinished ...>
[pid 371447] <... futex resumed>) = 1
[pid 371453] <... futex resumed>) = 0
[pid 371451] <... close resumed>) = 0
[pid 371447] futex(0x7f3824c4119c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 371451] openat(AT_FDCWD, "/home/ubuntu/.near/data/3139501.sst", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 371453] pread64(145, <unfinished ...>
so seems openat
is an expensive guy.
Interesting, when using Dwarf based stack unwinder profile looks way saner:
+ 98.46% 0.00% neard neard [.] rayon_core::join::join_context::{{closure}} ◆
+ 98.45% 0.09% neard neard [.] state_viewer::apply_chain_range::apply_chain_range::{{cl▒
+ 94.20% 0.08% neard neard [.] <near_store::db::RocksDB as near_store::db::Database>::g▒
+ 93.80% 0.08% neard neard [.] rocksdb_get_pinned_cf ▒
+ 93.73% 0.01% neard neard [.] rocksdb::DBImpl::Get ▒
+ 93.72% 0.11% neard neard [.] rocksdb::DBImpl::GetImpl ▒
+ 93.32% 0.19% neard neard [.] rocksdb::Version::Get ▒
+ 92.72% 0.18% neard neard [.] rocksdb::TableCache::Get ▒
+ 84.44% 0.00% neard neard [.] <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Jo▒
+ 68.47% 0.31% neard neard [.] rocksdb::BlockFetcher::ReadBlockContents ▒
+ 56.83% 0.03% neard neard [.] rocksdb::FilterBlockReaderCommon<rocksdb::BlockContents>▒
+ 56.79% 0.02% neard neard [.] rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::BlockCo▒
- 56.77% 0.10% neard neard [.] rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<r▒
- 56.67% rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::BlockContents> ▒
+ 54.86% rocksdb::BlockFetcher::ReadBlockContents ▒
+ 1.44% rocksdb::BlockBasedTable::PutDataBlockToCache<rocksdb::BlockContents> ▒
+ 54.78% 0.18% neard neard [.] rocksdb::BlockBasedTable::Get ▒
+ 50.36% 0.00% neard [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe ▒
+ 50.24% 0.02% neard [kernel.kallsyms] [k] do_syscall_64 ▒
+ 48.27% 0.14% neard neard [.] rocksdb::RandomAccessFileReader::Read ▒
+ 48.10% 0.10% neard neard [.] rocksdb::PosixRandomAccessFile::Read ▒
+ 47.99% 0.00% neard libpthread-2.31.so [.] __libc_pread (inlined) ▒
+ 47.17% 0.03% neard [kernel.kallsyms] [k] __x64_sys_pread64 ▒
+ 47.11% 0.01% neard [kernel.kallsyms] [k] ksys_pread64 ▒
+ 46.96% 0.03% neard [kernel.kallsyms] [k] vfs_read ▒
+ 46.71% 0.08% neard [kernel.kallsyms] [k] new_sync_read ▒
+ 46.64% 0.06% neard [kernel.kallsyms] [k] ext4_file_read_iter ▒
+ 46.58% 0.02% neard [kernel.kallsyms] [k] generic_file_read_iter ▒
+ 46.49% 2.50% neard [kernel.kallsyms] [k] generic_file_buffered_read ▒
+ 42.35% 0.14% neard neard [.] rocksdb::BlockBasedFilterBlockReader::MayMatch ▒
+ 41.58% 0.05% neard neard [.] rocksdb::FilterBlockReaderCommon<rocksdb::BlockContents>▒
+ 37.43% 0.03% neard neard [.] rocksdb::TableCache::FindTable ▒
+ 35.51% 0.43% neard [kernel.kallsyms] [k] copy_page_to_iter
With dominators like
- 93.60% rocksdb::DBImpl::GetImpl ▒
- 93.32% rocksdb::Version::Get ▒
- 92.72% rocksdb::TableCache::Get ▒
- 54.78% rocksdb::BlockBasedTable::Get ▒
- 42.35% rocksdb::BlockBasedFilterBlockReader::MayMatch ▒
- 41.58% rocksdb::FilterBlockReaderCommon<rocksdb::BlockContents>::GetOrReadFilterBlock ▒
- 41.52% rocksdb::FilterBlockReaderCommon<rocksdb::BlockContents>::ReadFilterBlock ▒
- rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::BlockContents> ▒
- rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::BlockContents> ▒
- 40.81% rocksdb::BlockFetcher::ReadBlockContents ▒
- 28.43% rocksdb::RandomAccessFileReader::Read ▒
- 28.41% rocksdb::PosixRandomAccessFile::Read ▒
- __libc_pread (inlined) ▒
- 28.28% entry_SYSCALL_64_after_hwframe ▒
- do_syscall_64 ▒
- 28.25% __x64_sys_pread64 ▒
- 28.24% ksys_pread64 ▒
- 28.17% vfs_read ▒
- 28.12% new_sync_read ▒
- ext4_file_read_iter ▒
- generic_file_read_iter ▒
- 28.05% generic_file_buffered_read ▒
- 23.28% copy_page_to_iter ▒
22.81% copy_user_enhanced_fast_string ▒
- 2.46% generic_file_buffered_read_get_pages ▒
2.32% find_get_pages_contig ▒
- 11.23% rocksdb::VerifyBlockChecksum ▒
11.21% rocksdb::crc32c::crc32c_3way ▒
- 1.06% operator new ▒
__libc_malloc (inlined) ▒
- 9.03% rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter> ▒
- 8.97% rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::Block> ▒
- rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::Block> ▒
- 8.40% rocksdb::BlockFetcher::ReadBlockContents ▒
- 6.04% rocksdb::RandomAccessFileReader::Read ▒
- 5.97% rocksdb::PosixRandomAccessFile::Read ▒
- 5.90% __libc_pread (inlined) ▒
- 5.70% entry_SYSCALL_64_after_hwframe ▒
- 5.69% do_syscall_64 ▒
- 5.66% __x64_sys_pread64 ▒
- 5.64% ksys_pread64 ▒
- 5.60% vfs_read ▒
- 5.54% new_sync_read ▒
- ext4_file_read_iter ▒
- generic_file_read_iter ▒
- 5.49% generic_file_buffered_read ▒
- 3.75% generic_file_buffered_read_get_pages ▒
- 3.11% page_cache_sync_ra ▒
- force_page_cache_ra ▒
- do_page_cache_ra ▒
- 3.04% page_cache_ra_unbounded ▒
- 2.06% read_pages ▒
- 1.24% blk_finish_plug ▒
- 1.23% blk_flush_plug_list ▒
- blk_mq_flush_plug_list ▒
- blk_mq_sched_insert_requests
copy_user_enhanced_fast_string
is reached via
- state_viewer::apply_chain_range::apply_chain_range::{{closure}} ▒
- 12.70% <near_chain::store::ChainStore as near_chain::store::ChainStoreAccess>::get_block_hash_by_height ▒
near_store::Store::get_ser ▒
<near_store::db::RocksDB as near_store::db::Database>::get ▒
rocksdb_get_pinned_cf ▒
rocksdb::DBImpl::Get ▒
rocksdb::DBImpl::GetImpl ▒
rocksdb::Version::Get ▒
- rocksdb::TableCache::Get ▒
- 12.24% rocksdb::BlockBasedTable::Get ▒
- 12.05% rocksdb::BlockBasedFilterBlockReader::MayMatch ▒
rocksdb::FilterBlockReaderCommon<rocksdb::BlockContents>::GetOrReadFilterBlock ▒
rocksdb::FilterBlockReaderCommon<rocksdb::BlockContents>::ReadFilterBlock ▒
rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::BlockContents> ▒
rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::BlockContents> ▒
rocksdb::BlockFetcher::ReadBlockContents ▒
rocksdb::RandomAccessFileReader::Read ▒
rocksdb::PosixRandomAccessFile::Read ▒
__libc_pread (inlined) ▒
entry_SYSCALL_64_after_hwframe ▒
do_syscall_64 ▒
__x64_sys_pread64 ▒
ksys_pread64 ▒
vfs_read ▒
new_sync_read ▒
ext4_file_read_iter ▒
generic_file_read_iter ▒
generic_file_buffered_read ▒
copy_page_to_iter ▒
copy_user_enhanced_fast_string
@olonho so basically it is within rocksdb?
Yes, seems so
Further investigations show pretty mediocre IO performance on the system.
Namely iotop
tool shows the following:
Total DISK READ: 11.47 M/s | Total DISK WRITE: 0.00 B/s
Current DISK READ: 11.47 M/s | Current DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
378500 be/4 ubuntu 2.31 M/s 0.00 B/s 0.00 % 10.75 % ./target/release/neard --home /home/~_range --shard-id 0 --only-contracts
378501 be/4 ubuntu 2.02 M/s 0.00 B/s 0.00 % 8.78 % ./target/release/neard --home /home/~_range --shard-id 0 --only-contracts
378503 be/4 ubuntu 1897.52 K/s 0.00 B/s 0.00 % 8.73 % ./target/release/neard --home /home/~_range --shard-id 0 --only-contracts
378505 be/4 ubuntu 1670.11 K/s 0.00 B/s 0.00 % 7.43 % ./target/release/neard --home /home/~_range --shard-id 0 --only-contracts
378504 be/4 ubuntu 1267.50 K/s 0.00 B/s 0.00 % 5.61 % ./target/release/neard --home /home/~_range --shard-id 0 --only-contracts
378502 be/4 ubuntu 1356.97 K/s 0.00 B/s 0.00 % 5.25 % ./target/release/neard --home /home/~_range --shard-id 0 --only-contracts
378499 be/4 ubuntu 935.71 K/s 0.00 B/s 0.00 % 3.47 % ./target/release/neard --home /home/~_range --shard-id 0 --only-contracts
378506 be/4 ubuntu 182.67 K/s 0.00 B/s 0.00 % 0.58 % ./target/release/neard --home /home/~_range --shard-id 0 --only-contracts
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % systemd --system --deserialize 27
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
so we generally get 12M/sec disk drive performance. Compared to https://www.anandtech.com/show/7173/samsung-ssd-840-evo-review-120gb-250gb-500gb-750gb-1tb-models-tested/8 it hints that we get performance 3x worse than slowest of SSD drives in this list. Just for reference compared that to what Samsung T5 external SSD in my box can give and get results around 150M/sec (queue depth 64, using AmorphousDiskMark), or 15-20M/sec with queue depth 1. Queue depth of 8 shown 118M/sec. Internal SSD shows beefy 331M/sec.
https://cloud.google.com/compute/docs/disks/performance suggests that for GCP instance expected read throughput per instance is between 200MByte/sec and 9GByte/sec.
And some nodes (archive backup) shows actual peak read performance around 500Mb/sec per gcloud console.
Interesting. @olonho do you think the issue is specific to the instance you use to run the history playback tool? Is there some disk IO benchmark we can run to see whether the instance itself is problematic?
Moving to @marcelo-gonzalez for the OKR ownership transfer
Moving to @marcelo-gonzalez for the OKR ownership transfer
I don't think this is in the OKR?
According to https://github.com/near/nearcore/pull/5697#issue-1073190158, we can only play back 10 blocks per second even with multiple threads. This does not align with our expectations and it blocks the contract runtime team from validating the wasmer2 and wasmer0 give the exact same results on the entire mainnet history, which would allow us to completely remove wasmer0 support.
@olonho did some initial investigation and found that we spend a suspicious amount of time in kernel (see the profile below) ![image](https://user-images.githubusercontent.com/13601726/145128257-4b3de305-1c1f-4f02-84f0-cd857eff4020.png)