Open JubilantJerry opened 2 months ago
could you remove the binlog files and start again? maybe your binlog is corrupted
Removing the binlog files (indirectly by removing the tables from manticore.json
) allowed my index to be usable again. But the first instance of the OOM was not even during replaying binlog files, and my server was not low on memory at that time. If this is not a true instance of running out of memory, what might the error originate from?
Logs from the first OOM:
[Tue Jul 9 06:50:19.187 2024] [283560] rt: table pile_all: diskchunk 65839(10376), segments 32 saved in 0.473534 (0.695903) sec, RAM saved/new 68210823/134525801 ratio 0.336450 (soft
limit 45157611, conf limit 134217728)
[Tue Jul 9 06:50:19.517 2024] [283560] rt: table pile_all: diskchunk 65840(10377), segments 32 saved in 0.450257 (0.685808) sec, RAM saved/new 66431477/135477629 ratio 0.333333 (soft
limit 44739197, conf limit 134217728)
[Tue Jul 9 06:50:19.837 2024] [283560] rt: table pile_all: diskchunk 65841(10378), segments 32 saved in 0.452416 (0.621195) sec, RAM saved/new 68094324/134264291 ratio 0.336503 (soft
limit 45164696, conf limit 134217728)
[Tue Jul 9 06:50:20.161 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65842: prealloc failed: fai
led to mmap file '/var/lib/manticore/pile_all/pile_all.65842.spa': Cannot allocate memory (length=3380208)
[Tue Jul 9 06:50:20.347 2024] [283612] WARNING: rt: table pile_all failed to save disk chunk /var/lib/manticore/pile_all/pile_all.65844: failed to mmap file '/var/lib/manticore/pile_al
l/pile_all.65844.spidx.tmp.pgmvalues': Cannot allocate memory (length=4)
[Tue Jul 9 06:50:20.465 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65844.spidx.tmp.p
gmvalues': Cannot allocate memory (length=4)
[Tue Jul 9 06:50:20.686 2024] [283499] WARNING: last message repeated 1 times
[Tue Jul 9 06:50:20.686 2024] [283499] WARNING: rt: table pile_all failed to save disk chunk /var/lib/manticore/pile_all/pile_all.65845: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65845.spidx.tmp.pgmvalues': Cannot allocate memory (length=256000)
[Tue Jul 9 06:50:20.771 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65845.spidx.tmp.pgmvalues': Cannot allocate memory (length=256000)
[Tue Jul 9 06:50:21.445 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65846: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65846.spa': Cannot allocate memory (length=3380208)
[Tue Jul 9 06:50:21.805 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65847: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65847.spa': Cannot allocate memory (length=3380208)
[Tue Jul 9 06:50:22.128 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65848: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65848.spi': Cannot allocate memory (length=1488808)
[Tue Jul 9 06:50:22.490 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65849: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65849.spa': Cannot allocate memory (length=3380208)
[Tue Jul 9 06:50:22.884 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65850: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65850.spi': Cannot allocate memory (length=1520852)
[Tue Jul 9 06:50:23.286 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65851: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65851.spa': Cannot allocate memory (length=3380208)
[Tue Jul 9 06:50:23.744 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65853: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65853.spa': Cannot allocate memory (length=3380208)
[Tue Jul 9 06:50:24.321 2024] [283606] WARNING: last message repeated 1 times
[Tue Jul 9 06:50:24.321 2024] [283606] WARNING: rt: table pile_all failed to save disk chunk /var/lib/manticore/pile_all/pile_all.65855: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65855.spidx.tmp.pgmvalues': Cannot allocate memory (length=256000)
[Tue Jul 9 06:50:24.591 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65855.spidx.tmp.pgmvalues': Cannot allocate memory (length=256000)
[Tue Jul 9 06:50:25.079 2024] [283560] WARNING: last message repeated 1 times
[Tue Jul 9 06:50:25.079 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65856: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65856.spa': Cannot allocate memory (length=3380208)
[Tue Jul 9 06:50:25.242 2024] [283536] WARNING: rt common merge: table pile_all: failed to prealloc
[Tue Jul 9 06:50:25.588 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65857: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65857.spa': Cannot allocate memory (length=3380208)
[Tue Jul 9 06:50:25.588 2024] [283590] rt: table pile_all: optimized progressive chunk(s) 25879 ( left 10378 ) in 1d 3.6h
[Tue Jul 9 06:50:26.152 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65858: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65858.spa': Cannot allocate memory (length=3380208)
[Tue Jul 9 06:50:26.579 2024] [283511] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65859: prealloc failed: failed to mmap file '/var/lib/manticore/pile_all/pile_all.65859.spa': Cannot allocate memory (length=3380208)
@JubilantJerry Can you please show docker inspect
of your container when it's running and free -m
?
Currently (after I've deleted the binlogs and resolved the "out of memory" errors that prevented Manticore from starting), free -m
gives:
total used free shared buff/cache available
Mem: 773845 533623 98422 4314 141799 89854
Swap: 2047 2024 23
I don't have the exact numbers from when the restarts were still happening but there was significantly more free memory than even this. Even with the above numbers, I wouldn't say the system is under memory pressure.
I installed Manticore directly with sudo apt install
, it doesn't run in a docker container.
Please show output of:
sudo cat /proc/`sudo cat /var/run/manticore/searchd.pid`/limits
Here is the output:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 83886080 83886080 bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 3095075 3095075 processes
Max open files 524288 524288 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 3095075 3095075 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
@JubilantJerry thanks. We've discussed this with the dev team and it looks really strange, especially Cannot allocate memory (length=4)
. Is it possible that you share your data files with us, so we can reproduce it locally? Here's how you can do it https://manual.manticoresearch.com/Reporting_bugs#Uploading-your-data
Best if you can first reproduce it with a lower data size. We'd also would need an instruction how to reproduce it.
Alternatively you can:
free
in free -m
is close to total
)ratio 0.33..
)Unfortunately I'm not sure how to reproduce this. The workload is indeed completely writes, this happens during the process of building a 1.6TB index that has 1.2 billion rows. We build up this index with many INSERT calls in many threads. It takes about 2 days. Even then, we've done this about 10 times and only the most recent time did we encounter this OOM. I do have another machine to try things, but even if I reproduce it there, I don't think we can upload 1.6TB to S3. I have a suspicion that the problem fundamentally requires a large scale, for example if the root cause is some kind of integer or data structure overflow. Our script had already finished 85% of the inserts without issues, nothing special happened around the time of the OOM. I think it's worth noting that in the past we were making all the (non text) attributes columnar no_fast_fetch, and didn't get the OOM. In this run with the OOM, we stopped using columnar and we enabled index_field_lengths. If it's possible to share useful information even without uploading 1.6TB to S3, I can try running the whole process again on my other machine, and see if maybe it happens consistently with the non-columnar design of the index.
Can you elaborate more on
INSERT calls in many threads
? Is it a thread pool with a fixed number of threads like you have 10 write threads and therefore there can't be more than 10 inserts trying to write to Manticore at once? Or does it work like you prepare a batch and send it to background no matter how many batches are already being uploaded?
Specifically it's a call to map
in Python's multiprocessing.Pool
. It's synchronous, our pool size was 128. Each worker in the pool performs a bulk insert with a batch of 100 rows. So there can be 128 simultaneous batch insert operations, but these all run to completion before new insert operations are made.
might be related https://github.com/manticoresoftware/dev/issues/141 to implement of the new mode for posting data into RT index
@JubilantJerry can you share your full searchd log?
I think I have a way to consistently repro the problem now, though still not a repro that is completely independent. Today I observed that trying to resume the insert queries after the point of the last crash results in another "OOM".
My full searchd.log is 19GB, but I was able to bring it to 90MB with the command:
grep --text -v -e "empty binlog /var/lib/manticore/binlog" -e "binlog: replaying log" /var/log/manticore/searchd.log > /var/log/manticore/searchd2.log
Somehow, the file contains binary data too. Here are the contents: https://drive.google.com/file/d/1_WiBkoPZ-J7MBd-Afsx0HyKI8RRebNvj/view?usp=sharing
The total_tokens
number is definitely many orders of magnitude larger than the actual count.
I will need to drop the index and revert to the columnar design by tomorrow to stay on schedule with my project. If there's any more information that might be useful to provide before I do that, please let me know.
The total_tokens number is definitely many orders of magnitude larger than the actual count.
Right. We'll discuss what are the limits there and the chance to overflow.
Another kind of edge case is:
[Tue Jul 9 06:50:25.588 2024] [283590] rt: table pile_all: optimized progressive chunk(s) 25879 ( left 10378 ) in 1d 3.6h
I'm afraid 25K disk chunks is not something we've ever tested yet, nor do most users deal with such a large number. Additionally, given that merging them took 27 hours and there are still another 10K chunks left, which would probably take another half a day to complete, it might make sense to significantly increase the rt_mem_limit to reduce the load from merging. Even though segment merging in the RAM chunk may take longer, which will slow down the inserts, overall it may still be beneficial since the maximum number of accumulated chunks will be lower, thereby reducing the load from merging them. Also, this issue might be somehow related to the memory allocation problem.
I also see that the first mem allocation issue in the provided searchd log was at:
[Tue Jul 9 06:50:20.161 2024] [283560] WARNING: rt: table pile_all failed to load disk chunk after RAM save: disk chunk /var/lib/manticore/pile_all/pile_all.65842: prealloc failed: failed to mm
ap file '/var/lib/manticore/pile_all/pile_all.65842.spa': Cannot allocate memory (length=3380208)
i.e. at the same minute when the merging of 25K disk chunks completed. Since then there were many mem allocation issues, but there were at least 10K disk chunks left to merge and I'm thinking if it can be a sign of a strong correlation between the mem issue and the high number of disk chunks.
[Tue Jul 9 10:07:28.976 2024] [1742768] starting daemon version '6.3.0 1811a9efb@24052209 (columnar 2.3.0 88a01c3@24052206) (secondary 2.3.0 88a01c3@24052206) (knn 2.3.0 88a01c3@24052206)' ...
We released 6.3.2 with 10+ bug fixes. Not sure it can help, but I recommend upgrading.
| ram_bytes | 187992513896 |
...
| disk_mapped | 222924089630 |
| disk_mapped_cached | 187988262912 |
...
| ram_bytes | 40305267560 |
...
| disk_mapped | 40301977120 |
| disk_mapped_cached | 40304652288 |
ram_bytes: 187992513896 + 40305267560 = 212 GB disk_mapped: 222924089630 + 40301977120 = 245 GB disk_mapped_cached: 187988262912 + 40304652288 = 213 GB
So the RAM requirements then were 245GB with 212GB being resident in RAM.
When reproducing the issue, I would first try out the 25K disk chunks hypothesis:
rt_mem_limit
to the minimum value (8MB, if I'm not mistaken).
Bug Description:
Manticore crashes during indexing with a
Fatal: out of memory
error, even though Manticore is only using 35GB and the system has more than 500GB free out of 770GB total. Even the Virtual Memory column shown on top (which should be irrelevant) is only 280GB. The system does not actually have any memory pressure. Manticore reports the error on startup trying to allocate only 5.6MB when replaying binlogs, and enters a loop of repeated restarts. I've since disabled the offending index frommanticore.json
, let Manticore start up (which made it delete binlog files), then added the table back intomanticore.json
.Logs:
Contents of
/proc/[pid]/limits
:Manticore Search Version:
6.3.0
Operating System Version:
Ubuntu 22.04.3 LTS
Have you tried the latest development version?
No
Internal Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.