Open sjoshi10 opened 10 months ago
@asias do we have a concurrency problem with load_and_stream?
We load and process only 16 sstables per shard at a time.
502GB / 24 core / 2 threads = 10 GB per shard
@sjoshi10 you can try move less tables at a time run load and stream. Let it finish. Repeat until all files are processed.
@asias is there an option to do that? because I basically have 10TB of data from snapshot in upload directory. This snapshot was created from another cluster.
You can do it in a simple script:
for file in files: cp file to scylla upload run load and stream
Alright, I'll give this a try and will let you know if I run into same issue.
When i try to load some of the files instead of all, I get errors like this:
md-195655-big-CompressionInfo.db: file not found)
This file belongs in directory I'm copying from. Is there a better way to copy files. Seems like there are dependencies for files.
But @asias we want to fix the bug too, no t just work around it.
I remember that we were worried about having to give up some optimizations, like the one I introduced, that sorts sstables by their first token. For that, we can do something like reading the minimum from disk in order to find their first token (from the top of my head, we can just skip to the end of summary file and retrieve first key), then we perform sorting on file names by first token, then proceed to open them incrementally taking into account target desired concurrency
We're talking about bad_alloc here, not optimizations.
We're talking about bad_alloc here, not optimizations.
I was of course talking about bad alloc too, and I think you will agree with me it's better to fix a problem without losing a good existing optimization. The fix of limiting the number of sstables load and stream work with at any point in time has to be done exactly as I suggested to not lose the optimization so I am leaving this as an instruction to whoever gets to work on it.
But @asias we want to fix the bug too, no t just work around it.
Of course, the workaround helps to understand the problem too, in addition to help the user to move forward instead of waiting for long time before there is a solution.
ping @asias for next steps here.
When i try to load some of the files instead of all, I get errors like this:
md-195655-big-CompressionInfo.db: file not found)
This file belongs in directory I'm copying from. Is there a better way to copy files. Seems like there are dependencies for files.
Yes. There are multiple component files for a given sstable. We need to copy all of the component files for the given sstable. E.g., you can run: cp md-195655-big* my_dst_dir to copy the sstable components.
ping @asias for next steps here.
We need a reproducer and there are few details in the report. I suspect it is a generic issue when loading too many sstables into memory from upload directory. The sstable loading is before we do load and stream. The load and stream only process 16 sstables at a time, I do not think this is going to cause too much memory pressures.
Running into issues while trying to load one of the tables. I’ve retried it multiple times by dropping the table and still getting same error. Not sure what is causing the issue. Other table works fine but this one table is giving us an issue.
Installation details
Hardware details (for performance issues)
This is the error I'm getting: