Open porscheme opened 1 year ago
Upon further investigation, I was able to root cause the failure.
The SST file generation and ingest assumes to help offload sorting computation of the NebulaGraph cluster, to accelerate batch data importing.
It's not built for data ingesting across clusters as the SST data itself is cluster-context related, and the file structure is highly related to its cluster internal states i.e. space id, etc.
Thanks for reply.
Can we do incremental updates to graph through SST files?
Yes, SST files were not just for full data import, it's actually incremental on the whole graph perspective, like we could do it every night
Yes, SST files were not just for full data import, it's actually incremental on the whole graph perspective, like we could do it every night
Oh nice, how does it handle deletes?
from my understanding, there is no pure deletion(but we do have insert and update) in exchange, @Nicole00 correct me if wrong :)
Any update on this, how to deo deletion through SST?
If not possible to do deletion using SST, what's best alternate?
@wey-gu
Any help is really appreciated. Thanks
Below is what I have done.
Data
nebula-storaged.conf:
########## basics ##########
Whether to run as a daemon process
--daemonize=true
The file to host the process id
--pid_file=pids/nebula-storaged.pid
Whether to use the configuration obtained from the configuration file
--local_config=true
########## logging ##########
The directory to host logging files
--log_dir=logs
Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=1
Maximum seconds to buffer the log messages
--logbufsecs=0
Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=storaged-stdout.log --stderr_log_file=storaged-stderr.log
Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2
Wether logging files' name contain timestamp.
--timestamp_in_logfile_name=true
########## networking ##########
Comma separated Meta server addresses
--meta_server_addrs=127.0.0.1:9559
Local IP used to identify the nebula-storaged process.
Change it to an address other than loopback if the service is distributed or
will be accessed remotely.
--local_ip=127.0.0.1
Storage daemon listening port
--port=9779
HTTP service ip
--ws_ip=0.0.0.0
HTTP service port
--ws_http_port=19779
heartbeat with meta service
--heartbeat_interval_secs=10
######### Raft #########
Raft election timeout
--raft_heartbeat_interval_secs=30
RPC timeout for raft client (ms)
--raft_rpc_timeout_ms=500
recycle Raft WAL
--wal_ttl=14400
########## Disk ##########
Root data path. split by comma. e.g. --data_path=/disk1/path1/,/disk2/path2/
One path per Rocksdb instance.
--data_path=data/storage
Minimum reserved bytes of each data path
--minimum_reserved_bytes=268435456
The default reserved bytes for one batch operation
--rocksdb_batch_size=4096
The default block cache size used in BlockBasedTable.
The unit is MB.
--rocksdb_block_cache=40960
Disable page cache to better control memory used by rocksdb.
Caution: Make sure to allocate enough block cache if disabling page cache!
--disable_page_cache=false
Compression algorithm, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
For the sake of binary compatibility, the default value is snappy.
Recommend to use:
* lz4 to gain more CPU performance, with the same compression ratio with snappy
* zstd to occupy less disk space
* lz4hc for the read-heavy write-light scenario
--rocksdb_compression=lz4
Set different compressions for different levels
For example, if --rocksdb_compression is snappy,
"no:no:lz4:lz4::zstd" is identical to "no:no:lz4:lz4:snappy:zstd:snappy"
In order to disable compression for level 0/1, set it to "no:no"
--rocksdb_compression_per_level=
############## rocksdb Options ##############
rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma
--rocksdb_db_options={"max_subcompactions":"4","max_background_jobs":"4"}
rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_column_family_options={"disable_auto_compactions":"false","write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_block_based_table_options={"block_size":"8192"}
Whether or not to enable rocksdb's statistics, disabled by default
--enable_rocksdb_statistics=false
Statslevel used by rocksdb to collection statistics, optional values are
* kExceptHistogramOrTimers, disable timer stats, and skip histogram stats
* kExceptTimers, Skip timer stats
* kExceptDetailedTimers, Collect all stats except time inside mutex lock AND time spent on compression.
* kExceptTimeForMutex, Collect all stats except the counters requiring to get time inside the mutex lock.
* kAll, Collect all stats
--rocksdb_stats_level=kExceptHistogramOrTimers
Whether or not to enable rocksdb's prefix bloom filter, enabled by default.
--enable_rocksdb_prefix_filtering=true
Whether or not to enable rocksdb's whole key bloom filter, disabled by default.
--enable_rocksdb_whole_key_filtering=false
############## Key-Value separation ##############
Whether or not to enable BlobDB (RocksDB key-value separation support)
--rocksdb_enable_kv_separation=false
RocksDB key value separation threshold in bytes. Values at or above this threshold will be written to blob files during flush or compaction.
--rocksdb_kv_separation_threshold=100
Compression algorithm for blobs, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
--rocksdb_blob_compression=lz4
Whether to garbage collect blobs during compaction
--rocksdb_enable_blob_garbage_collection=true
############## storage cache ##############
Whether to enable storage cache
--enable_storage_cache=false
Total capacity reserved for storage in memory cache in MB
--storage_cache_capacity=0
Number of buckets in base 2 logarithm. E.g., in case of 20, the total number of buckets will be 2^20.
A good estimate can be ceil(log2(cache_entries * 1.6)). The maximum allowed is 32.
--storage_cache_buckets_power=20
Number of locks in base 2 logarithm. E.g., in case of 10, the total number of locks will be 2^10.
A good estimate can be max(1, buckets_power - 10). The maximum allowed is 32.
--storage_cache_locks_power=10
Whether to add vertex pool in cache. Only valid when storage cache is enabled.
--enable_vertex_pool=false
Vertex pool size in MB
--vertex_pool_capacity=50
TTL in seconds for vertex items in the cache
--vertex_item_ttl=300
Whether to add empty key pool in cache. Only valid when storage cache is enabled.
--enable_empty_key_pool=false
Empty key pool size in MB
--empty_key_pool_capacity=50
TTL in seconds for empty key items in the cache
--empty_key_item_ttl=300
############### misc #################### --snapshot_part_rate_limit=10485760 --snapshot_batch_size=1048576 --rebuild_index_part_rate_limit=4194304 --rebuild_index_batch_size=1048576
########## Custom ########## --enable_partitioned_index_filter=true --max_edge_returned_per_vertex=100000 --move_files=true
BinaryData
Events:
inux nebula-cluster-storaged-0 5.4.0-1098-azure #104~18.04.2-Ubuntu SMP Tue Nov 29 12:13:35 UTC 2022 x86_64 x86_64 x86_64