Open rkruze opened 2 years ago
I don't understand how this makes sense. I may move this to a discussion later, but I want to think.
Is fine to upload a TB, if you care about data archival, why wouldn't you care about historical data? Not sure is worth the complexity overhead in the codebase.
I'm moving this out of Shadow Indexing GA
until I understand the impact.
I don't think we should do this. Reasons:
Who is this for, and what problem do they have today?
When you enable shadow indexing, it will go back and upload all the data in the cluster to S3/GCS. This might have a significant performance impact on the cluster. By default, we should only upload data sent to the cluster after shadow indexing is enabled unless a parameter is set telling Redpanda to upload all previous data to S3/GCS.
What are the success criteria?
When shadow indexing is enabled, only new data is uploaded to S3/GCS.
Why is solving this problem impactful?
If we upload all data by default, we could potentially impact the cluster as we will be reading a large amount of data from the data volume.
JIRA Link: CORE-783