Couchdb - intermittent fabric timeout errors

wolfman1956 commented 4 years ago

Good day GH community,

As this is my very 1st issue submission, I will try to be as concise, thorough and detailed as possible. I will apologize in advance for any miss or improper use or format, for my inquiry.

ENVIRONMENT:

we have couchdb running from 2 master servers connected to 2 3-nodes clusters (A=main, B=secondary. Our environment consists of:

Nutanix cluster; 2 masters and 2 x 3-node clusters (1 x primary and 1 x secondary). We test using the primary only.
Kazoo 4.3-91.el7.centos
CentOS Linux release 7.7.1908 (Core)
Couchdb.x86_64 2.3.1-1.el7

CONTEXT:

When running a specific string using the following command string:

sup crossbar_maintenance init_apps /var/www/html/monster-ui/apps http://our_1st_master01.kazoo.our_domain.ca:8000/v2

..the script will load flawlessly 3 out of 5 times with no errors, but will have errors the 2 other times (timeouts) but will also eventually finish, taking longer than the usual 100% successful load of between 5-6 seconds. Depending on the # of timeouts encountered, it can take up to 1m25s, but as mentioned will eventually load.

We have looked into most if not all timeout related parameters in the default.ini file, tested and still get same results with intermittent fabric_timeout errors in the couchdb log.

Extract from /var/log/couchdb/couchdb.log after a run that displayed timeout error:

[error] 2020-03-19T20:09:01.224839Z couchdb@ourprodbox.kazoo.xxxxitelecom.ca <0.510.125> 93969851c9 rexi_server: from: couchdb@ourprodbox.kazoo.xxxxitelecom.ca(<14056.18571.128>) mfa: fabric_rpc:update_docs/3 exit:timeout [{fabric_doc_atts,'-receiver_callback/2-fun-1-',1,[{file,"src/fabric_doc_atts.erl"},{line,56}]},{couch_att,write_streamed_attachment,3,[{file,"src/couch_att.erl"},{line,611}]},{couch_db,with_stream,3,[{file,"src/couch_db.erl"},{line,1413}]},{couch_db,'-doc_flush_atts/2-lc$^0/1-0-',2,[{file,"src/couch_db.erl"},{line,1365}]},{couch_db,doc_flush_atts,2,[{file,"src/couch_db.erl"},{line,1365}]},{couch_db,'-update_docs/4-lc$^4/1-4-',2,[{file,"src/couch_db.erl"},{line,1184}]},{couch_db,'-update_docs/4-lc$^3/1-3-',2,[{file,"src/couch_db.erl"},{line,1183}]},{couch_db,update_docs,4,[{file,"src/couch_db.erl"},{line,1183}]}]

Config files from group A nodes: (have to post entire files as it won't accept copy of files, even with a ".txt" extension)

default.ini
local.ini
vm.args

### - default.ini

; Upgrading CouchDB will overwrite this file. [vendor] name = The Apache Software Foundation

[couchdb] uuid = database_dir = ./data view_index_dir = ./data ; util_driver_dir = ; plugin_dir = os_process_timeout = 50000 ; 50 seconds. for view servers. max_dbs_open = 15000 delayed_commits = false ; Method used to compress everything that is appended to database and view index files, except ; for attachments (see the attachments section). Available methods are: ; ; none - no compression ; snappy - use google snappy, a very fast compressor/decompressor ; deflate_N - use zlib's deflate, N is the compression level which ranges from 1 (fastest, ; lowest compression ratio) to 9 (slowest, highest compression ratio) file_compression = snappy ; Higher values may give better read performance due to less read operations ; and/or more OS page cache hits, but they can also increase overall response ; time for writes when there are many attachment write requests in parallel. attachment_stream_buffer_size = 4096 ; Default security object for databases if not explicitly set ; everyone - same as couchdb 1.0, everyone can read/write ; admin_only - only admins can read/write ; admin_local - sharded dbs on :5984 are read/write for everyone, ; local dbs on :5986 are read/write for admins only default_security = admin_local ; btree_chunk_size = 1279 ; maintenance_mode = false ; stem_interactive_updates = true ; uri_file = ; The speed of processing the _changes feed with doc_ids filter can be ; influenced directly with this setting - increase for faster processing at the ; expense of more memory usage. changes_doc_ids_optimization_threshold = 1000 ; Maximum document ID length. Can be set to an integer or 'infinity'. ; max_document_id_length = infinity ; ; Limit maximum document size. Requests to create / update documents with a body ; size larger than this will fail with a 413 http error. This limit applies to ; requests which update a single document as well as individual documents from ; a _bulk_docs request. Since there is no canonical size of json encoded data, ; due to variabiliy in what is escaped or how floats are encoded, this limit is ; applied conservatively. For example 1.0e+16 could be encoded as 1e16, so 4 used ; for size calculation instead of 7. ; max_document_size = 4294967296 ; bytes ; ; Maximum attachment size. ; max_attachment_size = infinity ; ; Do not update the least recently used DB cache on reads, only writes ; update_lru_on_read = false ; ; The default storage engine to use when creating databases ; is set as a key into the [couchdb_engines] section. default_engine = couch ; ; Enable this to only "soft-delete" databases when DELETE /{db} requests are ; made. This will place a .recovery directory in your data directory and ; move deleted databases/shards there instead. You can then manually delete ; these files later, as desired. ; enable_database_recovery = false

[couchdb_engines] ; The keys in this section are the filename extension that ; the specified engine module will use. This is important so ; that couch_server is able to find an existing database without ; having to ask every configured engine. couch = couch_bt_engine

[cluster] q=8 n=3 ; placement = metro-dc-a:2,metro-dc-b:1

; Supply a comma-delimited list of node names that this node should ; contact in order to join a cluster. If a seedlist is configured the _up ; endpoint will return a 404 until the node has successfully contacted at ; least one of the members of the seedlist and replicated an up-to-date copy ; of the _nodes, _dbs, and _users system databases. ; seedlist = couchdb@node1.example.com,couchdb@node2.example.com

[chttpd] ; These settings affect the main, clustered port (5984 by default). port = 5984 bind_address = 127.0.0.1 backlog = 512 docroot = ./share/www socket_options = [{sndbuf, 262144}, {nodelay, true}] server_options = [{recbuf, undefined}] require_valid_user = false ; List of headers that will be kept when the header Prefer: return=minimal is included in a request. ; If Server header is left out, Mochiweb will add its own one in. prefer_minimal = Cache-Control, Content-Length, Content-Range, Content-Type, ETag, Server, Transfer-Encoding, Vary ; ; Limit maximum number of databases when tying to get detailed information using ; _dbs_info in a request max_db_number_for_dbs_info_req = 100

; authentication handlers ; authentication_handlers = {chttpd_auth, cookie_authentication_handler}, {chttpd_auth, default_authentication_handler} ; uncomment the next line to enable proxy authentication ; authentication_handlers = {chttpd_auth, proxy_authentication_handler}, {chttpd_auth, cookie_authentication_handler}, {chttpd_auth, default_authentication_handler}

; prevent non-admins from accessing /_all_dbs ; admin_only_all_dbs = false

[database_compaction] ; larger buffer sizes can originate smaller files doc_buffer_size = 524288 ; value in bytes checkpoint_after = 5242880 ; checkpoint after every N bytes were written

[view_compaction] ; larger buffer sizes can originate smaller files keyvalue_buffer_size = 2097152 ; value in bytes

[couch_peruser] ; If enabled, couch_peruser ensures that a private per-user database ; exists for each document in _users. These databases are writable only ; by the corresponding user. Databases are in the following form: ; userdb-{hex encoded username} enable = false ; If set to true and a user is deleted, the respective database gets ; deleted as well. delete_dbs = false ; Set a default q value for peruser-created databases that is different from ; cluster / q ; q = 1 ; prefix for user databases. If you change this after user dbs have been ; created, the existing databases won't get deleted if the associated user ; gets deleted because of the then prefix mismatch. database_prefix = userdb-

[httpd] port = 5986 bind_address = 127.0.0.1 authentication_handlers = {couch_httpd_auth, cookie_authentication_handler}, {couch_httpd_auth, default_authentication_handler} secure_rewrites = true allow_jsonp = false ; Options for the MochiWeb HTTP server. ; server_options = [{backlog, 128}, {acceptor_pool_size, 16}] ; For more socket options, consult Erlang's module 'inet' man page. ; socket_options = [{recbuf, undefined}, {sndbuf, 262144}, {nodelay, true}] ; socket_options = [{sndbuf, 262144}] socket_options = [{nodelay, true}] enable_cors = false enable_xframe_options = false ; CouchDB can optionally enforce a maximum uri length; ; max_uri_length = 8000 changes_timeout = 60000 ; config_whitelist = ; max_uri_length = ; rewrite_limit = 100 ; x_forwarded_host = X-Forwarded-Host ; x_forwarded_proto = X-Forwarded-Proto ; x_forwarded_ssl = X-Forwarded-Ssl ; Maximum allowed http request size. Applies to both clustered and local port. max_http_request_size = 4294967296 ; 4GB

; [httpd_design_handlers] ; _view =

; [ioq] ; concurrency = 10 ; ratio = 0.01

[ssl] port = 6984

; [chttpd_auth] ; authentication_db = _users

; [chttpd_auth_cache] ; max_lifetime = 600000 ; max_objects = ; max_size = 104857600

; [mem3] ; nodes_db = _nodes ; shard_cache_size = 25000 ; shards_db = _dbs ; sync_concurrency = 10

; [fabric] all_docs_concurrency = 100 ; changes_duration = shard_timeout_factor = 10 ; uuid_prefix_len = 7 request_timeout = infinity all_docs_timeout = 15000 attachments_timeout = 60000 ; view_timeout = 3600000 ; partition_view_timeout = 3600000

; [rexi] ; buffer_count = 2000 ; server_per_node = true

; [global_changes] ; max_event_delay = 25 max_write_delay = 25000 ; update_db = true

; [view_updater] ; min_writer_items = 100 ; min_writer_size = 16777216

[couch_httpd_auth] ; WARNING! This only affects the node-local port (5986 by default). ; You probably want the settings under [chttpd]. authentication_db = _users authentication_redirect = /_utils/session.html require_valid_user = false timeout = 600 ; number of seconds before automatic logout auth_cache_size = 50 ; size is number of cache entries allow_persistent_cookies = true ; set to false to disallow persistent cookies iterations = 10 ; iterations for password hashing ; min_iterations = 1 ; max_iterations = 1000000000 ; password_scheme = pbkdf2 ; proxy_use_secret = false ; comma-separated list of public fields, 404 if empty ; public_fields = ; secret = ; users_db_public = false ; cookie_domain = example.com

; CSP (Content Security Policy) Support for _utils [csp] enable = true ; header_value = default-src 'self'; img-src 'self'; font-src *; script-src 'self' 'unsafe-eval'; style-src 'self' 'unsafe-inline';

[cors] credentials = false ; List of origins separated by a comma, means accept all ; Origins must include the scheme: http://example.com ; You can't set origins: and credentials = true at the same time. ; origins = * ; List of accepted headers separated by a comma ; headers = ; List of accepted methods ; methods =

; Configuration for a vhost ; [cors:http://example.com] ; credentials = false ; List of origins separated by a comma ; Origins must include the scheme: http://example.com ; You can't set origins: * and credentials = true at the same time. ; origins = ; List of accepted headers separated by a comma ; headers = ; List of accepted methods ; methods =

; Configuration for the design document cache ; [ddoc_cache] ; The maximum size of the cache in bytes ; max_size = 104857600 ; 100MiB ; The period each cache entry should wait before ; automatically refreshing in milliseconds refresh_timeout = 67000

[x_frame_options] ; Settings same-origin will return X-Frame-Options: SAMEORIGIN. ; If same origin is set, it will ignore the hosts setting ; same_origin = true ; Settings hosts will return X-Frame-Options: ALLOW-FROM https://example.com/ ; List of hosts separated by a comma. * means accept all ; hosts =

[native_query_servers] ; erlang query server ; enable_erlang_query_server = false

; Changing reduce_limit to false will disable reduce_limit. ; If you think you're hitting reduce_limit with a "good" reduce function, ; please let us know on the mailing list so we can fine tune the heuristic. [query_server_config] ; commit_freq = 5 reduce_limit = true os_process_limit = 100 ; os_process_idle_limit = 300 ; os_process_soft_limit = 100 ; Timeout for how long a response from a busy view group server can take. ; "infinity" is also a valid configuration value. group_info_timeout = 5000

[mango] ; Set to true to disable the "index all fields" text index, which can lead ; to out of memory issues when users have documents with nested array fields. ; index_all_disabled = false ; Default limit value for mango _find queries. ; default_limit = 25

[indexers] couch_mrview = true

[uuids] ; Known algorithms: ; random - 128 bits of random awesome ; All awesome, all the time. ; sequential - monotonically increasing ids with random increments ; First 26 hex characters are random. Last 6 increment in ; random amounts until an overflow occurs. On overflow, the ; random prefix is regenerated and the process starts over. ; utc_random - Time since Jan 1, 1970 UTC with microseconds ; First 14 characters are the time in hex. Last 18 are random. ; utc_id - Time since Jan 1, 1970 UTC with microseconds, plus utc_id_suffix string ; First 14 characters are the time in hex. uuids/utc_id_suffix string value is appended to these. algorithm = sequential ; The utc_id_suffix value will be appended to uuids generated by the utc_id algorithm. ; Replicating instances should have unique utc_id_suffix values to ensure uniqueness of utc_id ids. utc_id_suffix =

; Maximum number of UUIDs retrievable from /_uuids in a single request max_count = 1000

[attachments] compression_level = 8 ; from 1 (lowest, fastest) to 9 (highest, slowest), 0 to disable compression compressible_types = text/*, application/javascript, application/json, application/xml

[replicator] ; Random jitter applied on replication job startup (milliseconds) startup_jitter = 5000 ; Number of actively running replications max_jobs = 500 ;Scheduling interval in milliseconds. During each reschedule cycle interval = 60000 ; Maximum number of replications to start and stop during rescheduling. max_churn = 20 ; More worker processes can give higher network throughput but can also ; imply more disk and network IO. worker_processes = 16 ; With lower batch sizes checkpoints are done more frequently. Lower batch sizes ; also reduce the total amount of used RAM memory. worker_batch_size = 500 ; Maximum number of HTTP connections per replication. http_connections = 20 ; HTTP connection timeout per replication. ; Even for very fast/reliable networks it might need to be increased if a remote ; database is too busy. connection_timeout = 60000 ; Request timeout request_timeout = infinity ; If a request fails, the replicator will retry it up to N times. retries_per_request = 10 ; Use checkpoints ; use_checkpoints = true ; Checkpoint interval ; checkpoint_interval = 30000 ; Some socket options that might boost performance in some scenarios: ; {nodelay, boolean()} ; {sndbuf, integer()} ; {recbuf, integer()} ; {priority, integer()} ; See the inet Erlang module's man page for the full list of options. socket_options = [{keepalive, true}, {nodelay, false}] ; Path to a file containing the user's certificate. ; cert_file = /full/path/to/server_cert.pem ; Path to file containing user's private PEM encoded key. ; key_file = /full/path/to/server_key.pem ; String containing the user's password. Only used if the private keyfile is password protected. ; password = somepassword ; Set to true to validate peer certificates. verify_ssl_certificates = false ; File containing a list of peer trusted certificates (in the PEM format). ; ssl_trusted_certificates_file = /etc/ssl/certs/ca-certificates.crt ; Maximum peer certificate depth (must be set even if certificate validation is off). ssl_certificate_max_depth = 3 ; Maximum document ID length for replication. ; max_document_id_length = 0 ; How much time to wait before retrying after a missing doc exception. This ; exception happens if the document was seen in the changes feed, but internal ; replication hasn't caught up yet, and fetching document's revisions ; fails. This a common scenario when source is updated while continous ; replication is running. The retry period would depend on how quickly internal ; replication is expected to catch up. In general this is an optimisation to ; avoid crashing the whole replication job, which would consume more resources ; and add log noise. missing_doc_retry_msec = 24000 ; Wait this many seconds after startup before attaching changes listeners ; cluster_start_period = 5 ; Re-check cluster state at least every cluster_quiet_period seconds ; cluster_quiet_period = 60

; List of replicator client authentication plugins to try. Plugins will be ; tried in order. The first to initialize successfully will be used for that ; particular endpoint (source or target). Normally couch_replicator_auth_noop ; would be used at the end of the list as a "catch-all". It doesn't do anything ; and effectively implements the previous behavior of using basic auth. ; There are currently two plugins available: ; couch_replicator_auth_session - use _session cookie authentication ; couch_replicator_auth_noop - use basic authentication (previous default) ; Currently, the new _session cookie authentication is tried first, before ; falling back to the old basic authenticaion default: ; auth_plugins = couch_replicator_auth_session,couch_replicator_auth_noop ; To restore the old behaviour, use the following value: ; auth_plugins = couch_replicator_auth_noop

; Force couch_replicator_auth_session plugin to refresh the session ; periodically if max-age is not present in the cookie. This is mostly to ; handle the case where anonymous writes are allowed to the database and a VDU ; function is used to forbid writes based on the authenticated user name. In ; that case this value should be adjusted based on the expected minimum session ; expiry timeout on replication endpoints. If session expiry results in a 401 ; or 403 response this setting is not needed. ; session_refresh_interval_sec = 550

[compaction_daemon] ; The delay, in seconds, between each check for which database and view indexes ; need to be compacted. check_interval = 3600 ; If a database or view index file is smaller then this value (in bytes), ; compaction will not happen. Very small files always have a very high ; fragmentation therefore it's not worth to compact them. min_file_size = 131072 ; With lots of databases and/or with lots of design docs in one or more ; databases, the compaction_daemon can create significant CPU load when ; checking whether databases and view indexes need compacting. The ; snooze_period_ms setting ensures a smoother CPU load. Defaults to ; 3000 milliseconds wait. Note that this option was formerly called ; snooze_period, measured in seconds (it is currently still supported). ; snooze_period_ms = 3000

[compactions] ; List of compaction rules for the compaction daemon. ; The daemon compacts databases and their respective view groups when all the ; condition parameters are satisfied. Configuration can be per database or ; global, and it has the following format: ; ; database_name = [ {ParamName, ParamValue}, {ParamName, ParamValue}, ... ] ; _default = [ {ParamName, ParamValue}, {ParamName, ParamValue}, ... ] ; ; Possible parameters: ; ; db_fragmentation - If the ratio (as an integer percentage), of the amount ; of old data (and its supporting metadata) over the database ; file size is equal to or greater then this value, this ; database compaction condition is satisfied. ; This value is computed as: ; ; (file_size - data_size) / file_size 100 ; ; The data_size and file_size values can be obtained when ; querying a database's information URI (GET /dbname/). ; ; view_fragmentation - If the ratio (as an integer percentage), of the amount ; of old data (and its supporting metadata) over the view ; index (view group) file size is equal to or greater then ; this value, then this view index compaction condition is ; satisfied. This value is computed as: ; ; (file_size - data_size) / file_size 100 ; ; The data_size and file_size values can be obtained when ; querying a view group's information URI ; (GET /dbname/_design/groupname/_info). ; ; from and to - The period for which a database (and its view groups) compaction ; is allowed. The value for these parameters must obey the format: ; ; HH:MM - HH:MM (HH in [0..23], MM in [0..59]) ; ; strict_window - If a compaction is still running after the end of the allowed ; period, it will be canceled if this parameter is set to 'true'. ; It defaults to 'false' and it's meaningful only if the period ; parameter is also specified. ; ; * parallel_view_compaction - If set to 'true', the database and its views are ; compacted in parallel. This is only useful on ; certain setups, like for example when the database ; and view index directories point to different ; disks. It defaults to 'false'. ; ; Before a compaction is triggered, an estimation of how much free disk space is ; needed is computed. This estimation corresponds to 2 times the data size of ; the database or view index. When there's not enough free disk space to compact ; a particular database or view index, a warning message is logged. ; ; Examples: ; ; 1) [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}] ; The foo database is compacted if its fragmentation is 70% or more. ; Any view index of this database is compacted only if its fragmentation ; is 60% or more. ; ; 2) [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}] ; Similar to the preceding example but a compaction (database or view index) ; is only triggered if the current time is between midnight and 4 AM. ; ; 3) [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}, {strict_window, true}] ; Similar to the preceding example - a compaction (database or view index) ; is only triggered if the current time is between midnight and 4 AM. If at ; 4 AM the database or one of its views is still compacting, the compaction ; process will be canceled. ; ; 4) [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}, {strict_window, true}, {parallel_view_compaction, true}] ; Similar to the preceding example, but a database and its views can be ; compacted in parallel. ; _default = [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}]

[log] ; Possible log levels: ; debug ; info ; notice ; warning, warn ; error, err ; critical, crit ; alert ; emergency, emerg ; none ; level = info ; ; Set the maximum log message length in bytes that will be ; passed through the writer ; ; max_message_size = 16000 ; ; ; There are three different log writers that can be configured ; to write log messages. The default writes to stderr of the ; Erlang VM which is useful for debugging/development as well ; as a lot of container deployments. ; ; There's also a file writer that works with logrotate and an ; rsyslog writer for deployments that need to have logs sent ; over the network. ; writer = stderr ; ; File Writer Options: ; ; The file writer will check every 30s to see if it needs ; to reopen its file. This is useful for people that configure ; logrotate to move log files periodically. ; ; file = ./couch.log ; Path name to write logs to ; ; Write operations will happen either every write_buffer bytes ; or write_delay milliseconds. These are passed directly to the ; Erlang file module with the write_delay option documented here: ; ; http://erlang.org/doc/man/file.html ; ; write_buffer = 0 ; write_delay = 0 ; ; ; Syslog Writer Options: ; ; The syslog writer options all correspond to their obvious ; counter parts in rsyslog nomenclature. ; ; syslog_host = ; syslog_port = 514 ; syslog_appid = couchdb ; syslog_facility = local2

[stats] ; Stats collection interval in seconds. Default 10 seconds. interval = 10

- local.ini

; CouchDB Configuration Settings

; Custom settings should be made in this file. They will override settings ; in default.ini, but unlike changes made to default.ini, this file won't be ; overwritten on server upgrade.

[couchdb] ; max_document_size = 4294967296 ; bytes os_process_timeout = 30000 uuid = xxxxxxxxxxxxxxxxxxxxxxx

[couch_peruser] ; If enabled, couch_peruser ensures that a private per-user database ; exists for each document in _users. These databases are writable only ; by the corresponding user. Databases are in the following form: ; userdb-{hex encoded username} ; enable = true ; If set to true and a user is deleted, the respective database gets ; deleted as well. ; delete_dbs = true ; Set a default q value for peruser-created databases that is different from ; cluster / q q = 3

[chttpd] ; port = 5984 bind_address = 0.0.0.0 ; Options for the MochiWeb HTTP server. ; server_options = [{backlog, 128}, {acceptor_pool_size, 16}] ; For more socket options, consult Erlang's module 'inet' man page. ; socket_options = [{sndbuf, 262144}, {nodelay, true}]

[httpd] ; port = 5986 bind_address = 0.0.0.0 ; NOTE that this only configures the "backend" node-local port, not the ; "frontend" clustered port. You probably don't want to change anything in ; this section. ; Uncomment next line to trigger basic-auth popup on unauthorized requests. ; WWW-Authenticate = Basic realm="administrator"

; Uncomment next line to set the configuration modification whitelist. Only ; whitelisted values may be changed via the /_config URLs. To allow the admin ; to change this value over HTTP, remember to include {httpd,config_whitelist} ; itself. Excluding it from the list would require editing this file to update ; the whitelist. ; config_whitelist = [{httpd,config_whitelist}, {log,level}, {etc,etc}]

[query_servers] ; nodejs = /usr/local/bin/couchjs-node /path/to/couchdb/share/server/main.js

[couch_httpd_auth] ; If you set this to true, you should also uncomment the WWW-Authenticate line ; above. If you don't configure a WWW-Authenticate header, CouchDB will send ; Basic realm="server" in order to prevent you getting logged out. ; require_valid_user = false secret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

[ssl] ; enable = true ; cert_file = /full/path/to/server_cert.pem ; key_file = /full/path/to/server_key.pem ; password = somepassword ; set to true to validate peer certificates ; verify_ssl_certificates = false ; Set to true to fail if the client does not send a certificate. Only used if verify_ssl_certificates is true. ; fail_if_no_peer_cert = false ; Path to file containing PEM encoded CA certificates (trusted ; certificates used for verifying a peer certificate). May be omitted if ; you do not want to verify the peer. ; cacert_file = /full/path/to/cacertf ; The verification fun (optional) if not specified, the default ; verification fun will be used. ; verify_fun = {Module, VerifyFun} ; maximum peer certificate depth ; ssl_certificate_max_depth = 1 ; ; Reject renegotiations that do not live up to RFC 5746. ; secure_renegotiate = true ; The cipher suites that should be supported. ; Can be specified in erlang format "{ecdhe_ecdsa,aes_128_cbc,sha256}" ; or in OpenSSL format "ECZHE-EZDSA-AES128-SHA256". ; ciphers = ["ECZHE-EZDSA-AES128-SHA256", "ECZHE-ECDSA-AES128-SHA"] ; The SSL/TLS versions to support ; tls_versions = [tlsv1, 'tlsv1.1', 'tlsv1.2']

; To enable Virtual Hosts in CouchDB, add a vhost = path directive. All requests to ; the Virual Host will be redirected to the path. In the example below all requests ; to http://example.com/ are redirected to /database. ; If you run CouchDB on a specific port, include the port number in the vhost: ; example.com:5984 = /database [vhosts] ; example.com = /database/

; To create an admin account uncomment the '[admins]' section below and add a ; line in the format 'username = password'. When you next start CouchDB, it ; will change the password to a hash (so that your passwords don't linger ; around in plain-text files). You can add more admin accounts with more ; 'username = password' lines. Don't forget to restart CouchDB after ; changing this. [admins] ; admin = mysecretpassword admin = -pbkdf2-5c7544ee89e15fa12nopasswdhere5817d97fcf,1dc2cf9a72f283c62f0afbea2b194a59,10

[cluster] q=3 r=2 w=2 n=3 placement = BellDC1:2,BellDC2:2

### - vm.args

; Licensed under the Apache License, Version 2.0 (the "License"); you may not ; use this file except in compliance with the License. You may obtain a copy of ; the License at ; ; http://www.apache.org/licenses/LICENSE-2.0 ; ; Unless required by applicable law or agreed to in writing, software ; distributed under the License is distributed on an "AS IS" BASIS, WITHOUT ; WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the ; License for the specific language governing permissions and limitations under ; the License.

; Each node in the system must have a unique name. These are specified through ; the Erlang -name flag, which takes the form nodename@hostname. CouchDB ; recommends the following values for this flag: ; ; 1. If this is a single node, not in a cluster, use: ; -name couchdb@127.0.0.1 ; 2. If DNS is configured for this host, use the FQDN, such as: ; -name couchdb@my.host.domain.com ; 3. If DNS isn't configured for this host, use IP addresses only, such as: ; -name couchdb@192.168.0.1 ; ; Do not rely on tricks with /etc/hosts or libresolv to handle anything ; other than the above 3 approaches correctly. ; ; Multiple CouchDBs running on the same machine can use couchdb1@, couchdb2@, ; etc. -name couchdb@ourprodbox.kazoo.xxxxitelecom.ca

; All nodes must share the same magic cookie for distributed Erlang to work. ; Comment out this line if you synchronized the cookies by other means (using ; the ~/.erlang.cookie file, for example). -setcookie DC3TA7tyKRnocookiehere3q5P

; Tell kernel and SASL not to log anything -kernel error_logger silent -sasl sasl_error_logger false

; Use kernel poll functionality if supported by emulator +K true

; Start a pool of asynchronous IO threads +A 16

; Comment this line out to enable the interactive Erlang shell on startup +Bd -noinput

; Force use of the smp scheduler, fixes #1296 -smp enable

; Set maximum SSL session lifetime to reap terminated replication readers -ssl session_lifetime 600

-kernel inet_dist_listen_min 9100 -kernel inet_dist_listen_max 9100

- QUESTIONS:

1- how can pinpoint the source of the fabric timeouts ? 2- which parameter(s) in the config file(s) needs to be adjusted ?

wolfman1956 commented 4 years ago

Forgot to include other parameter info:

prlimit; RESOURCE DESCRIPTION SOFT HARD UNITS AS address space limit unlimited unlimited bytes CORE max core file size 0 unlimited blocks CPU CPU time unlimited unlimited seconds DATA max data size unlimited unlimited bytes FSIZE max file size unlimited unlimited blocks LOCKS max number of file locks held unlimited unlimited MEMLOCK max locked-in-memory address space 65536 65536 bytes MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes NICE max nice prio allowed to raise 0 0 NOFILE max number of open files 65535 65535 NPROC max number of processes 65535 65535 RSS max resident set size unlimited unlimited pages RTPRIO max real-time priority 0 0 RTTIME timeout for real-time tasks unlimited unlimited microsecs SIGPENDING max number of pending signals 31052 31052 STACK max stack size 8388608 unlimited bytes

wolfman1956 commented 4 years ago

Unknown why system states I closed this 3 days ago, I simply added more information...never closed this ticket ???

wolfman1956 commented 4 years ago

Good morning folks...just wondering if anyone had a chance to review this case, and if any additional information may be required ?

Thanks to all who'll look into this and comment :-)

wolfman1956 commented 4 years ago

Up....corrected a few items in the initial post.

wolfman1956 / fabric_timeout

Couchdb - intermittent fabric timeout errors #1

- local.ini

- QUESTIONS: