Closed rinor closed 1 year ago
I couldn't reproduce this issue: OpenObserve v0.5.2 (downloaded from https://github.com/openobserve/openobserve/releases) runs fine with ops run openobserve -c config-openobserve.json --smp 2
.
My configuration file is as below:
$ cat config-openobserve.json
{
"BaseVolumeSz": "200m",
"RunConfig": {
"Memory": "2G",
"Ports": ["5080", "5081"]
},
"Files": ["proc/sys/kernel/ostype", "proc/sys/kernel/osrelease"],
"Env": {
"ZO_ROOT_USER_EMAIL": "bob@bob.com",
"ZO_ROOT_USER_PASSWORD": "bob"
}
}
I also tried with the nightly kernel build (-n
), and also with different CPU count (e.g. --smp 8
), and it always ran correctly, with CPU usage near 0 after application startup.
fyi, i just made a pkg of openobserve last week - https://repo.ops.city/v2/packages/eyberg/openobserve/0.5.1/x86_64/show - it seems to boot fine for me on latest && also nightly - can you post a '--trace' ?
Here is the trace of one run with clean storage that deadlocks.
ops run -c config.json openobserve --smp=2 --trace
# with clean storageStarting with clean storage deadlocks some time, other times not. While restarting/rebooting with an already existing storage deadlocks all the time (at least on my side). I'm starting to think that it might not be related to tokio per se but with sled since I have other issues on some other tests/apps involving sled and tokio (have yet to investigate and gather definitive results but it fails/deadlocks on some stuff related to sled pagecache and /dev/shm
on nanos)
On a side note how does nanos handle /dev/shm
(i.e: is it auto mounted/mmapped or its just on fs)
we don't support that today as it typically implies multiple processes which we won't support; there have a been a few cases though where this gets used in a single process env for other reasons, which after a quick grep looks like that's what is going on here looks like it is configurable (just by setting a path?) in https://github.com/spacejam/sled/blob/005c023ca94d424d8e630125e4c21320ed160031/src/config.rs#L414
which after a quick grep looks like that's what is going on here looks like it is configurable (just by setting a path?)
yes it's configurable when needed, but in this case it's not being used at all. The purpose of that path is for temporary inmem sled default db https://github.com/spacejam/sled/blob/005c023ca94d424d8e630125e4c21320ed160031/src/config.rs#L253 and that's not the case.
This is mostly related to sled behavior under nanos. While the sled version used here is old (although that's the latest stable crate released) and known to have some issues with async and thread pools under certain conditions, the idea here was to check that nanos is behaving correctly (not openobserve or sled).
I'll close this issue since I'm unable to provide reproducible code that points to nanos issue and not program issue. I do have some services (unfortunately using this tech stack involving buggy sled) planned to target nanos that fail right now, so I may revisit this topic in the future and provide additional information.
I'll check again now with https://github.com/nanovms/nanos/pull/2024, since I had a couple of services having issues on smp and https://github.com/nanovms/nanos/pull/2024/commits/37053f6e3618e786de36bf08d7ab8e9d754eaabe fixes are relevant
As an example OpenObserve can't boot properly on cpu count > 1 (it keeps all vCpu cores/threads at 100%).
Tested also some internal programs that use similar tech stack (tokio, sled, ..) and the behavior is the same.
Will investigate further and provide more info when/if available.
- **create volume** ```sh $ ops volume create nanos-openobserve-data -s 1G ``` - **ops config.json** ```json { "BaseVolumeSz": "128M", "Mounts": { "nanos-openobserve-data": "data" }, "Dirs": [ "proc" ], "Klibs": [ "special_files" ], "ManifestPassthrough": { "special_files": { "disks": {} } }, "Env": { "__pyroscope__": "", "ZO_PROF_PYROSCOPE_SERVER_URL": "http://localhost:4040", "ZO_PROF_PYROSCOPE_PROJECT_NAME": "openobserve", "__auth__": "", "ZO_ROOT_USER_EMAIL": "test@test.io", "ZO_ROOT_USER_PASSWORD": "test@test.io", "__http__": "", "ZO_HTTP_PORT": "5080", "ZO_HTTP_ADDR": "", "ZO_HTTP_IPV6_ENABLED": "false", "__grpc__": "", "ZO_GRPC_PORT": "5081", "ZO_GRPC_ADDR": "", "ZO_GRPC_TIMEOUT": "600", "ZO_GRPC_ORG_HEADER_KEY": "openobserve-org-id", "ZO_INTERNAL_GRPC_TOKEN": "", "__tcp__": "", "ZO_TCP_PORT": "5514", "ZO_UDP_PORT": "5514", "__route__": "", "ZO_ROUTE_TIMEOUT": "600", "__common__": "", "ZO_LOCAL_MODE": "true", "ZO_LOCAL_MODE_STORAGE": "disk", "ZO_NODE_ROLE": "all", "ZO_CLUSTER_NAME": "zo1", "ZO_INSTANCE_NAME": "", "ZO_DATA_DIR": "./data/openobserve/", "ZO_DATA_WAL_DIR": "./data/openobserve/wal/", "ZO_DATA_STREAM_DIR": "./data/openobserve/stream/", "ZO_BASE_URI": "", "ZO_WAL_MEMORY_MODE_ENABLED": "false", "ZO_WAL_LINE_MODE_ENABLED": "true", "ZO_PARQUET_COMPRESSION": "zstd", "ZO_COLUMN_TIMESTAMP": "_timestamp", "ZO_WIDENING_SCHEMA_EVOLUTION": "false", "ZO_SKIP_SCHEMA_VALIDATION": "false", "ZO_FEATURE_PER_THREAD_LOCK": "false", "ZO_FEATURE_FULLTEXT_ON_ALL_FIELDS": "false", "ZO_UI_ENABLED": "true", "ZO_UI_SQL_BASE64_ENABLED": "false", "ZO_METRICS_DEDUP_ENABLED": "true", "ZO_TRACING_ENABLED": "false", "OTEL_OTLP_HTTP_ENDPOINT": "", "ZO_TRACING_HEADER_KEY": "Authorization", "ZO_TRACING_HEADER_VALUE": "Basic YWRtaW46Q29tcGxleHBhc3MjMTIz", "ZO_TELEMETRY": "false", "ZO_TELEMETRY_URL": "https://e1.zinclabs.dev", "ZO_PROMETHEUS_ENABLED": "false", "ZO_PRINT_KEY_CONFIG": "false", "ZO_PRINT_KEY_SQL": "false", "ZO_USAGE_REPORTING_ENABLED": "false", "ZO_USAGE_REPORTING_COMPRESSED_SIZE": "false", "ZO_USAGE_ORG": "_meta", "ZO_USAGE_BATCH_SIZE": "2000", "ZO_DYNAMO_META_STORE_ENABLED": "false", "ZO_DYNAMO_FILE_LIST_TABLE": "", "__limit__": "", "ZO_JSON_LIMIT": "209715200", "ZO_PAYLOAD_LIMIT": "209715200", "ZO_MAX_FILE_SIZE_ON_DISK": "32", "ZO_MAX_FILE_RETENTION_TIME": "600", "ZO_FILE_PUSH_INTERVAL": "60", "ZO_FILE_MOVE_THREAD_NUM": "0", "ZO_QUERY_THREAD_NUM": "0", "ZO_INGEST_ALLOWED_UPTO": "5", "ZO_METRICS_LEADER_PUSH_INTERVAL": "15", "ZO_METRICS_LEADER_ELECTION_INTERVAL": "30", "ZO_METRICS_FILE_RETENTION": "daily", "ZO_HEARTBEAT_INTERVAL": "30", "ZO_COLS_PER_RECORD_LIMIT": "0", "ZO_HTTP_WORKER_NUM": "0", "ZO_CALCULATE_STATS_INTERVAL": "600", "__compact__": "", "ZO_COMPACT_ENABLED": "true", "ZO_COMPACT_FAKE_MODE": "false", "ZO_COMPACT_INTERVAL": "60", "ZO_COMPACT_SYNC_TO_DB_INTERVAL": "1800", "ZO_COMPACT_MAX_FILE_SIZE": "256", "ZO_COMPACT_DATA_RETENTION_DAYS": "3650", "ZO_COMPACT_BLOCKED_ORGS": "", "__memorycache__": "", "ZO_MEMORY_CACHE_ENABLED": "true", "ZO_MEMORY_CACHE_CACHE_LATEST_FILES": "false", "ZO_MEMORY_CACHE_MAX_SIZE": "0", "ZO_MEMORY_CACHE_SKIP_SIZE": "0", "ZO_MEMORY_CACHE_RELEASE_SIZE": "0", "__log__": "", "RUST_LOG": "debug", "EVENTS_ENABLED": "false", "EVENTS_AUTH": "cm9vdEBleGFtcGxlLmNvbTpUZ0ZzZFpzTUZQdzg2SzRK", "EVENTS_EP": "https://api.openobserve.ai/api/debug/events/_json", "EVENTS_BATCH_SIZE": "10", "__etcd__": "", "ZO_ETCD_ADDR": "localhost:2379", "ZO_ETCD_PREFIX": "/data/openobserve/etcd_prefix/", "ZO_ETCD_CONNECT_TIMEOUT": "5", "ZO_ETCD_COMMAND_TIMEOUT": "10", "ZO_ETCD_LOCK_WAIT_TIMEOUT": "3600", "ZO_ETCD_USER": "", "ZO_ETCD_PASSWORD": "", "ZO_ETCD_CLIENT_CERT_AUTH": "false", "ZO_ETCD_TRUSTED_CA_FILE": "", "ZO_ETCD_CERT_FILE": "", "ZO_ETCD_KEY_FILE": "", "ZO_ETCD_DOMAIN_NAME": "", "ZO_ETCD_LOAD_PAGE_SIZE": "1000", "__sled__": "", "ZO_SLED_DATA_DIR": "/data/openobserve/sled_data/", "ZO_SLED_PREFIX": "/data/openobserve/sled_prefix/", "__s3__": "", "ZO_S3_PROVIDER": "s3", "ZO_S3_SERVER_URL": "", "ZO_S3_REGION_NAME": "", "ZO_S3_ACCESS_KEY": "", "ZO_S3_SECRET_KEY": "", "ZO_S3_BUCKET_NAME": "", "ZO_S3_BUCKET_PREFIX": "", "ZO_S3_CONNECT_TIMEOUT": "10", "ZO_S3_REQUEST_TIMEOUT": "3600", "ZO_S3_FEATURE_FORCE_PATH_STYLE": "false", "ZO_S3_FEATURE_HTTP1_ONLY": "false", "ZO_S3_FEATURE_HTTP2_ONLY": "false", "ZO_S3_ALLOW_INVALID_CERTIFICATES": "false", "ZO_S3_SYNC_TO_CACHE_INTERVAL": "600", "__prometheus__": "", "ZO_PROMETHEUS_HA_CLUSTER": "cluster", "ZO_PROMETHEUS_HA_REPLICA": "__replica__", "__s3_swift__": "", "AWS_EC2_METADATA_DISABLED": "false" }, "Debugflags": [ "reboot_on_exit" ], "RunConfig": { "Memory": "2G", "UDPPorts": [ "5514" ], "Ports": [ "5514", "5080", "5081" ] }, "Boot": "/usr/local/src/nanos/output/platform/pc/boot/boot.img", "Kernel": "/usr/local/src/nanos/output/platform/pc/bin/kernel.img", "KlibDir": "/usr/local/src/nanos/output/klib/bin" } ``` - **proc folder** ```sh $ tree proc proc └── sys └── kernel ├── osrelease └── ostype 3 directories, 2 files ```ops
commands and config