POC: Upload segments to remote segment store

sachinpkale commented 2 years ago

Before finalizing design for the remote storage options , we want to perform some POCs. These POCs will help understand feasibility of some considerations that are mentioned in the feature proposal.

This is the first POC in the series and will focus on uploading segment files to the remote storage.

Goals

Identify segment files to upload

What are the files that constitute a segment and needs to be uploaded to the remote storage?
Keeping track of already uploaded segment files in order to support incremental upload.

Identify code flow to upload the segments
- Segment needs to be uploaded to remote store after creation on primary.
- Add the segment upload logic in the code such that upload failures will have the same impact as flush operation failure.
Identify success criteria and failures that need to be handled
- How do we know if the file is uploaded successfully and file contents are same as the one on local disk?
- Identify transient and permanent failures while uploading the segment.
- Add retry logic for transient errors.
Sync vs Async pattern of segment upload and impact on performance
- Run performance test to understand impact on critical APIs.
- Impact of sync vs async on performance, consistency etc.

sachinpkale commented 2 years ago

Identify segment files to upload

What are the files that constitute a segment and needs to be uploaded to the remote storage?

As explained in Lucene documentation, each segment maintains a bunch of files.
As OpenSearch uses default compound file format, each segment contains following files:
- Segment Info file (with extension .si)
- Compound file (with extension .cfs and .cfe)

sachinpkale commented 2 years ago

Initial commit: https://github.com/sachinpkale/OpenSearch/commit/befeb345ea66485dc6cb7d8a577cc8796d1d93ee

This commit does the following:

In flush flow, after Lucene commit is called and Before delete on old translog is called, newly created segments are uploaded to S3.
Keeps track of segment files uploaded so that only newly created files are uploaded again.
Failures are propogate to original flush method.
Sync call

What is missing:

S3 bucket name is hard-coded. Need to abstract it out to config.
In memory hash-map is used to keep track of already uploaded files. If OpenSearch process restarts, all the segments will be uploaded again.
Segment merges are not handled.

Next steps:

Upload only from primary
Correctness check for uploaded files using checksum
Performance tests

reta commented 2 years ago

@sachinpkale what happens during merges? (fe force merge) the segments should be replaced in the segment store, right?

sachinpkale commented 2 years ago

what happens during merges? (fe force merge)

Yes, the new segment created by merge will be added and the original segments used in merge will be marked for deletion. We may not delete them immediately.

sachinpkale commented 2 years ago

I have updated the What is missing part in the initial commit with segment merges.

sachinpkale commented 2 years ago

Current change uploads from primary as well as replica. The upload should happen only from the primary. Added in next steps.

sachinpkale commented 2 years ago

@CEHENKLE Please create a feature branch feature/durability-enhancements which will be used to run full test suite.

CEHENKLE commented 2 years ago

Done. https://github.com/opensearch-project/OpenSearch/tree/feature/durability-enhancements

sachinpkale commented 2 years ago

Pushed commit to upload segments only from primary: commit

Next Steps:

Correctness check for uploaded files using checksum
Performance tests

sachinpkale commented 2 years ago

Pushed new commit:

Do not delete old segments_N files. Thinking of using them to keep track of seq number of the uploaded commits.
Changed the remote path to index_UUID/shard_id/primary_term/

sachinpkale commented 2 years ago

Pushed new commit:

Upload a metadata file along with segment files: max_checkpoint. Currently, it keeps track of max local processed checkpoint of index that is successfully uploaded to remote store.
Sample log statements to understand format of some of the index files

opensearch-project / OpenSearch