Open kmuthukk opened 5 years ago
Another user reported similar error on slack channel:
I keep getting this error while trying to create a cluster or single instance, I'm using latest version and Fedora 29:
019-04-25 14:27:14,995 INFO: performing post-bootstrap initialization ... I0425 18:23:31.637509 10997 mem_tracker.cc:241] MemTracker: hard memory limit is 26.637360 GB 2019-04-25 14:27:14,995 INFO: I0425 18:23:31.637601 10997 mem_tracker.cc:243] MemTracker: soft memory limit is 22.641756 GB 2019-04-25 14:27:14,995 INFO: I0425 18:23:31.637888 10998 async_initializer.cc:77] Starting to init ybclient 2019-04-25 14:27:14,995 INFO: I0425 18:23:31.643432 10998 async_initializer.cc:83] Successfully built ybclient 2019-04-25 14:27:14,995 INFO: I0425 18:27:14.577764 11004 tablet_rpc.cc:318] Already present (yb/consensus/retryable_requests.cc:238): Failed Write(tablet: 00000000000000000000000000000000, num_ops: 1, num_attempts: 3, txn: 00000000-0000-0000-0000-000000000000) to tablet 00000000000000000000000000000000 on tablet server { uuid: d5b965c227eb4f8d960d7cdf956cb442 private: [host: "127.0.0.1" port: 7100] cloud_info: placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" after 3 attempt(s): Duplicate request 2019-04-25 14:27:14,995 INFO: E0425 18:27:14.578102 10997 elog.c:436] Stack trace for a FATAL log message 2019-04-25 14:27:14,995 INFO: @ 0x7f70b0f85bd1 YBCLogImpl 2019-04-25 14:27:14,995 INFO: @ 0x92aafc errfinish 2019-04-25 14:27:14,996 INFO: @ 0x95348c HandleYBStatus 2019-04-25 14:27:14,996 INFO: @ 0x697d35 YBCExecuteInsertIndex 2019-04-25 14:27:14,996 INFO: @ 0x5370d9 ybcininsert 2019-04-25
They are on regular disk as well (not SSDs).
In order to speed up initdb when creating a cluster, we bulk inserts into system catalog tables. On non-SSD disks, each bulk insert will take longer to complete and so there is a higher chance of timeout for the write RPC triggering the retry.
@robertpang - perhaps default timeouts are too aggressive?
Yes. We should increase the timeout for slower disks.
I seem to be getting this error in kubernetes using io1 disks from EC2 instances. I get it quite regularly currently with my processes that sync data from mongodb into yugabyte.
Jira Link: DB-1055 [Based on issue reported on yugabyte.com/slack channel.]
Setup: RF=1, 16-core machine, Storage: HDD (not SSD).
With 20 concurrent connections, user getting "Already present: Duplicate request, XX000".
iostat showed high await latencies:
YugaByte is not optimized for HDD. We require SSDs. But not withstanding that issue, the error we surface to end user is a bit cryptic here. YB might internally retry an operation when system is slow. When it detects that the underlying operation has already been executed, then if we can return the results of the previously executed operation (such as number of rows INSERTed), we should return SUCCESS instead of this "Duplicate request" error (which is more of an status code that should be internally handled), or return a vanilla timeout error if it is not feasible to obtain the results of the previously executed operation. The latter is what would happen if an operation against a DB genuinely timed out. The client application cannot be sure in such as case if the operation was successfully executed before or not (for any DB).