Open barzan-hayati opened 1 month ago
@barzan-hayati please refer this doc to export the whole Milvus logs for investigation. For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs. Also if would be very helpful if you could attach the etcd backup for investigation. Check this: https://github.com/milvus-io/birdwatcher for details about how to backup etcd with birdwatcher
/assign @barzan-hayati
Because the log file is too large to upload here, I have selected a portion that contains the relevant error.
I haven't installed Go yet, so I was unable to log etcd through Birdwatcher.
seems that you just don't have enough memory [2024/09/29 17:59:05.843 +00:00] [WARN] [delegator/delegator_data.go:472] ["worker failed to load segments"] [traceID=1a316b2e60e620141d3a6c576f755d0e] [collectionID=452785967535053061] [channel=by-dev-rootcoord-dml_1_452785967535053061v0] [replicaID=452785967859761153] [workID=3] [segments="[452785967535289170]"] [error="memory limit exceeded[predict=3.9962037e+11][limit=3.041807e+11]"]
it seems you have 300GB memory(Weird number but you try to load 400GB data)
a weird thing is that any of each segment should not take 100GB or more memory.
you need to use birdwatcher to check if there are large segment(more than 50GB in size)Then you will need to upgrade to 2.4.12 and there is a compaction help to reduce the maximum size of segments
seems that you just don't have enough memory [2024/09/29 17:59:05.843 +00:00] [WARN] [delegator/delegator_data.go:472] ["worker failed to load segments"] [traceID=1a316b2e60e620141d3a6c576f755d0e] [collectionID=452785967535053061] [channel=by-dev-rootcoord-dml_1_452785967535053061v0] [replicaID=452785967859761153] [workID=3] [segments="[452785967535289170]"] [error="memory limit exceeded[predict=3.9962037e+11][limit=3.041807e+11]"]
Milvus uses about half of the available memory, and I currently have 140 GB of free memory. Why should Milvus require additional memory during search operations? The collection has already been created, and indexing has been completed.
Please note that Milvus occasionally goes down after 8 hours and sometimes after just 4 hours.
I conducted another test, and when I encountered that error, I had to restart the Milvus Docker instance. It took approximately 3 minutes and 40 seconds to load the collection into RAM, after which I was able to perform searches again.
it seems you have 300GB memory(Weird number but you try to load 400GB data)
Why 400 G?
it seems you have 300GB memory(Weird number but you try to load 400GB data)
When I load the collection, it uses approximately 140 GB of RAM.
3.9962037e+11][limit=3.041807e+11
based on the log it said 400G is requirement but your pod limit is 300GB.
This is my suggestion
based on the log it said 400G is requirement but your pod limit is 300GB.
Let me present a scenario that demonstrates that memory usage remains relatively constant. I monitor the memory usage and log the details.
I conduct searches with various consistency levels and batch sizes. At the beginning of my program, the available memory is approximately 143 GB.
2024/09/30 13:46:26.659 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 100.000 total 283.290 available 143.899 percent 49.2 used 136.981 free 57.547 active 212.856 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.379 available_virtual_memory 50.796
2024/09/30 13:46:36.670 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.700 total 283.290 available 143.907 percent 49.2 used 136.973 free 57.555 active 212.850 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.379 available_virtual_memory 50.799
2024/09/30 13:46:46.682 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.200 total 283.290 available 143.902 percent 49.2 used 136.978 free 57.550 active 212.854 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.379 available_virtual_memory 50.797
2024/09/30 13:46:56.694 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.700 total 283.290 available 143.904 percent 49.2 used 136.977 free 57.551 active 212.854 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.797
2024/09/30 13:47:06.706 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 86.300 total 283.290 available 143.902 percent 49.2 used 136.979 free 57.549 active 212.854 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.797
2024/09/30 13:47:16.718 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.500 total 283.290 available 143.903 percent 49.2 used 136.978 free 57.550 active 212.854 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.797
2024/09/30 13:47:26.728 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.600 total 283.290 available 143.907 percent 49.2 used 136.974 free 57.554 active 212.853 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.798
2024/09/30 13:47:36.738 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 86.700 total 283.290 available 143.908 percent 49.2 used 136.972 free 57.555 active 212.851 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.799
2024/09/30 13:47:46.750 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.900 total 283.290 available 143.915 percent 49.2 used 136.966 free 57.562 active 212.846 inactive 6.359 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.801
2024/09/30 13:47:56.762 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.600 total 283.290 available 143.906 percent 49.2 used 136.974 free 57.553 active 212.850 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.798
2024/09/30 13:48:06.773 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 0.500 total 283.290 available 143.916 percent 49.2 used 136.964 free 57.564 active 212.837 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.802
2024/09/30 13:48:16.784 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 0.700 total 283.290 available 143.929 percent 49.2 used 136.951 free 57.576 active 212.831 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.806
After some time, as I mentioned, a search request made at 15:03:12 did not complete,
2024/09/30 15:03:12.264 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:03:22.272 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:03:32.285 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:03:42.296 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:03:52.308 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:04:02.320 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:04:12.336 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:04:22.349 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:04:32.364 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:04:42.379 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:04:52.395 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:05:02.410 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:05:12.426 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:05:22.442 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:05:32.472 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:05:42.486 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:05:52.501 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:06:02.518 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:06:12.532 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:06:22.548 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:06:32.561 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
2024/09/30 15:06:42.577 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
indicating that the Docker container had restarted (as confirmed by the Docker container uptime). I then checked the memory usage at that point. According to the log, the available memory before that time was around 143 GB, but after the Docker container restart, the available memory increased to 221 GB, then to 272 GB. Once the collection was fully loaded, the available memory returned to approximately 143 GB.
2024/09/30 15:02:01.973 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 80.900 total 283.290 available 143.891 percent 49.2 used 136.990 free 57.524 active 212.882 inactive 6.367 buffers 2.040 cached 86.736 shared 0.028 slab 5.383 available_virtual_memory 50.793
2024/09/30 15:02:11.985 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.500 total 283.290 available 143.893 percent 49.2 used 136.987 free 57.526 active 212.880 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.383 available_virtual_memory 50.794
2024/09/30 15:02:21.998 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.900 total 283.290 available 143.897 percent 49.2 used 136.983 free 57.530 active 212.876 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.383 available_virtual_memory 50.795
2024/09/30 15:02:32.010 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 78.500 total 283.290 available 143.890 percent 49.2 used 136.990 free 57.523 active 212.878 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.383 available_virtual_memory 50.793
2024/09/30 15:02:42.022 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 80.600 total 283.290 available 143.892 percent 49.2 used 136.989 free 57.525 active 212.875 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.383 available_virtual_memory 50.793
2024/09/30 15:02:52.034 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 46.200 total 283.290 available 221.237 percent 21.9 used 59.644 free 134.870 active 135.530 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.382 available_virtual_memory 78.098
2024/09/30 15:03:02.046 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 2.800 total 283.290 available 272.184 percent 3.9 used 8.696 free 185.825 active 84.930 inactive 6.360 buffers 2.040 cached 86.729 shared 0.028 slab 5.358 available_virtual_memory 96.080
2024/09/30 15:03:12.056 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 57.600 total 283.290 available 216.895 percent 23.4 used 63.985 free 130.532 active 140.056 inactive 6.363 buffers 2.040 cached 86.732 shared 0.028 slab 5.365 available_virtual_memory 76.563
2024/09/30 15:03:22.067 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 33.100 total 283.290 available 204.029 percent 28.0 used 76.852 free 117.655 active 152.924 inactive 6.374 buffers 2.040 cached 86.744 shared 0.028 slab 5.366 available_virtual_memory 72.021
2024/09/30 15:03:32.078 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.200 total 283.290 available 198.230 percent 30.0 used 82.651 free 111.850 active 158.705 inactive 6.380 buffers 2.040 cached 86.750 shared 0.028 slab 5.366 available_virtual_memory 69.974
2024/09/30 15:03:42.089 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.500 total 283.290 available 192.940 percent 31.9 used 87.940 free 106.554 active 163.984 inactive 6.386 buffers 2.040 cached 86.755 shared 0.028 slab 5.367 available_virtual_memory 68.107
2024/09/30 15:03:52.101 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 44.400 total 283.290 available 161.248 percent 43.1 used 119.632 free 74.858 active 195.619 inactive 6.391 buffers 2.040 cached 86.760 shared 0.028 slab 5.368 available_virtual_memory 56.920
2024/09/30 15:04:02.108 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.500 total 283.290 available 155.949 percent 45.0 used 124.931 free 69.554 active 200.900 inactive 6.396 buffers 2.040 cached 86.765 shared 0.028 slab 5.369 available_virtual_memory 55.049
2024/09/30 15:04:12.116 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.400 total 283.290 available 150.179 percent 47.0 used 130.701 free 63.779 active 206.665 inactive 6.400 buffers 2.040 cached 86.770 shared 0.028 slab 5.369 available_virtual_memory 53.012
2024/09/30 15:04:22.124 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.400 total 283.290 available 144.853 percent 48.9 used 136.027 free 58.448 active 211.974 inactive 6.405 buffers 2.040 cached 86.775 shared 0.028 slab 5.369 available_virtual_memory 51.133
2024/09/30 15:04:32.135 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.500 total 283.290 available 139.952 percent 50.6 used 140.929 free 53.542 active 216.870 inactive 6.409 buffers 2.040 cached 86.779 shared 0.028 slab 5.369 available_virtual_memory 49.402
2024/09/30 15:04:42.144 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 29.300 total 283.290 available 135.010 percent 52.3 used 145.869 free 48.596 active 221.795 inactive 6.414 buffers 2.040 cached 86.785 shared 0.028 slab 5.370 available_virtual_memory 47.658
2024/09/30 15:04:52.156 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 40.600 total 283.290 available 136.097 percent 52.0 used 144.783 free 49.678 active 220.683 inactive 6.419 buffers 2.040 cached 86.790 shared 0.028 slab 5.371 available_virtual_memory 48.041
2024/09/30 15:05:02.168 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 37.700 total 283.290 available 133.251 percent 53.0 used 147.629 free 46.826 active 223.525 inactive 6.424 buffers 2.040 cached 86.795 shared 0.028 slab 5.372 available_virtual_memory 47.037
2024/09/30 15:05:12.176 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 89.000 total 283.290 available 149.217 percent 47.3 used 131.664 free 62.787 active 207.548 inactive 6.429 buffers 2.040 cached 86.799 shared 0.028 slab 5.373 available_virtual_memory 52.674
2024/09/30 15:05:22.187 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 32.600 total 283.290 available 148.834 percent 47.5 used 132.046 free 62.399 active 207.931 inactive 6.434 buffers 2.040 cached 86.804 shared 0.028 slab 5.374 available_virtual_memory 52.537
2024/09/30 15:05:32.196 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 54.400 total 283.290 available 150.256 percent 47.0 used 130.624 free 63.816 active 206.511 inactive 6.439 buffers 2.040 cached 86.810 shared 0.028 slab 5.375 available_virtual_memory 53.040
2024/09/30 15:05:42.205 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.000 total 283.290 available 169.188 percent 40.3 used 111.692 free 82.744 active 187.592 inactive 6.443 buffers 2.040 cached 86.814 shared 0.028 slab 5.375 available_virtual_memory 59.724
2024/09/30 15:05:52.216 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 64.200 total 283.290 available 155.908 percent 45.0 used 124.973 free 69.457 active 200.870 inactive 6.448 buffers 2.040 cached 86.820 shared 0.028 slab 5.376 available_virtual_memory 55.034
2024/09/30 15:06:02.224 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 29.000 total 283.290 available 138.815 percent 51.0 used 142.065 free 52.365 active 217.930 inactive 6.449 buffers 2.040 cached 86.820 shared 0.028 slab 5.376 available_virtual_memory 49.001
2024/09/30 15:06:12.232 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 22.900 total 283.290 available 132.824 percent 53.1 used 148.056 free 46.373 active 223.928 inactive 6.449 buffers 2.040 cached 86.821 shared 0.028 slab 5.376 available_virtual_memory 46.886
2024/09/30 15:06:22.240 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 22.700 total 283.290 available 126.778 percent 55.2 used 154.102 free 40.327 active 229.964 inactive 6.450 buffers 2.040 cached 86.821 shared 0.028 slab 5.376 available_virtual_memory 44.752
2024/09/30 15:06:32.251 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 29.400 total 283.290 available 130.444 percent 54.0 used 150.436 free 43.992 active 226.291 inactive 6.451 buffers 2.040 cached 86.822 shared 0.028 slab 5.376 available_virtual_memory 46.046
2024/09/30 15:06:42.260 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 34.000 total 283.290 available 135.483 percent 52.2 used 145.397 free 49.030 active 221.250 inactive 6.451 buffers 2.040 cached 86.822 shared 0.028 slab 5.376 available_virtual_memory 47.824
2024/09/30 15:06:52.272 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.400 total 283.290 available 144.153 percent 49.1 used 136.727 free 57.699 active 212.572 inactive 6.452 buffers 2.040 cached 86.824 shared 0.028 slab 5.376 available_virtual_memory 50.885
2024/09/30 15:07:02.280 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.700 total 283.290 available 144.350 percent 49.0 used 136.531 free 57.895 active 212.398 inactive 6.453 buffers 2.040 cached 86.824 shared 0.028 slab 5.376 available_virtual_memory 50.955
2024/09/30 15:07:12.292 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.300 total 283.290 available 144.145 percent 49.1 used 136.736 free 57.690 active 212.607 inactive 6.453 buffers 2.040 cached 86.824 shared 0.028 slab 5.375 available_virtual_memory 50.882
2024/09/30 15:07:22.301 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 78.700 total 283.290 available 144.009 percent 49.2 used 136.872 free 57.554 active 212.742 inactive 6.453 buffers 2.040 cached 86.824 shared 0.028 slab 5.375 available_virtual_memory 50.834
2024/09/30 15:07:32.313 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 78.600 total 283.290 available 144.003 percent 49.2 used 136.877 free 57.549 active 212.752 inactive 6.453 buffers 2.040 cached 86.824 shared 0.028 slab 5.375 available_virtual_memory 50.832
This log file clearly shows that memory usage never utilizes the entire RAM. I also attempted to install Birdwatcher to examine the segment information.
based on the log it said 400G is requirement but your pod limit is 300GB.
Let me present a scenario that demonstrates that memory usage remains relatively constant. I monitor the memory usage and log the details.
I conduct searches with various consistency levels and batch sizes. At the beginning of my program, the available memory is approximately 143 GB.
2024/09/30 13:46:26.659 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 100.000 total 283.290 available 143.899 percent 49.2 used 136.981 free 57.547 active 212.856 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.379 available_virtual_memory 50.796 2024/09/30 13:46:36.670 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.700 total 283.290 available 143.907 percent 49.2 used 136.973 free 57.555 active 212.850 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.379 available_virtual_memory 50.799 2024/09/30 13:46:46.682 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.200 total 283.290 available 143.902 percent 49.2 used 136.978 free 57.550 active 212.854 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.379 available_virtual_memory 50.797 2024/09/30 13:46:56.694 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.700 total 283.290 available 143.904 percent 49.2 used 136.977 free 57.551 active 212.854 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.797 2024/09/30 13:47:06.706 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 86.300 total 283.290 available 143.902 percent 49.2 used 136.979 free 57.549 active 212.854 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.797 2024/09/30 13:47:16.718 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.500 total 283.290 available 143.903 percent 49.2 used 136.978 free 57.550 active 212.854 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.797 2024/09/30 13:47:26.728 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.600 total 283.290 available 143.907 percent 49.2 used 136.974 free 57.554 active 212.853 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.798 2024/09/30 13:47:36.738 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 86.700 total 283.290 available 143.908 percent 49.2 used 136.972 free 57.555 active 212.851 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.799 2024/09/30 13:47:46.750 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 85.900 total 283.290 available 143.915 percent 49.2 used 136.966 free 57.562 active 212.846 inactive 6.359 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.801 2024/09/30 13:47:56.762 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.600 total 283.290 available 143.906 percent 49.2 used 136.974 free 57.553 active 212.850 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.798 2024/09/30 13:48:06.773 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 0.500 total 283.290 available 143.916 percent 49.2 used 136.964 free 57.564 active 212.837 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.802 2024/09/30 13:48:16.784 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 0.700 total 283.290 available 143.929 percent 49.2 used 136.951 free 57.576 active 212.831 inactive 6.358 buffers 2.038 cached 86.724 shared 0.028 slab 5.380 available_virtual_memory 50.806
After some time, as I mentioned, a search request made at 15:03:12 did not complete,
2024/09/30 15:03:12.264 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:03:22.272 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:03:32.285 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:03:42.296 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:03:52.308 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:04:02.320 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:04:12.336 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:04:22.349 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:04:32.364 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:04:42.379 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:04:52.395 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:05:02.410 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:05:12.426 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:05:22.442 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:05:32.472 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:05:42.486 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:05:52.501 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:06:02.518 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:06:12.532 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:06:22.548 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:06:32.561 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search 2024/09/30 15:06:42.577 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(694): male_female_student has been down in collection_search
indicating that the Docker container had restarted (as confirmed by the Docker container uptime). I then checked the memory usage at that point. According to the log, the available memory before that time was around 143 GB, but after the Docker container restart, the available memory increased to 221 GB, then to 272 GB. Once the collection was fully loaded, the available memory returned to approximately 143 GB.
2024/09/30 15:02:01.973 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 80.900 total 283.290 available 143.891 percent 49.2 used 136.990 free 57.524 active 212.882 inactive 6.367 buffers 2.040 cached 86.736 shared 0.028 slab 5.383 available_virtual_memory 50.793 2024/09/30 15:02:11.985 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.500 total 283.290 available 143.893 percent 49.2 used 136.987 free 57.526 active 212.880 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.383 available_virtual_memory 50.794 2024/09/30 15:02:21.998 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.900 total 283.290 available 143.897 percent 49.2 used 136.983 free 57.530 active 212.876 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.383 available_virtual_memory 50.795 2024/09/30 15:02:32.010 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 78.500 total 283.290 available 143.890 percent 49.2 used 136.990 free 57.523 active 212.878 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.383 available_virtual_memory 50.793 2024/09/30 15:02:42.022 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 80.600 total 283.290 available 143.892 percent 49.2 used 136.989 free 57.525 active 212.875 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.383 available_virtual_memory 50.793 2024/09/30 15:02:52.034 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 46.200 total 283.290 available 221.237 percent 21.9 used 59.644 free 134.870 active 135.530 inactive 6.367 buffers 2.040 cached 86.737 shared 0.028 slab 5.382 available_virtual_memory 78.098 2024/09/30 15:03:02.046 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 2.800 total 283.290 available 272.184 percent 3.9 used 8.696 free 185.825 active 84.930 inactive 6.360 buffers 2.040 cached 86.729 shared 0.028 slab 5.358 available_virtual_memory 96.080 2024/09/30 15:03:12.056 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 57.600 total 283.290 available 216.895 percent 23.4 used 63.985 free 130.532 active 140.056 inactive 6.363 buffers 2.040 cached 86.732 shared 0.028 slab 5.365 available_virtual_memory 76.563 2024/09/30 15:03:22.067 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 33.100 total 283.290 available 204.029 percent 28.0 used 76.852 free 117.655 active 152.924 inactive 6.374 buffers 2.040 cached 86.744 shared 0.028 slab 5.366 available_virtual_memory 72.021 2024/09/30 15:03:32.078 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.200 total 283.290 available 198.230 percent 30.0 used 82.651 free 111.850 active 158.705 inactive 6.380 buffers 2.040 cached 86.750 shared 0.028 slab 5.366 available_virtual_memory 69.974 2024/09/30 15:03:42.089 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.500 total 283.290 available 192.940 percent 31.9 used 87.940 free 106.554 active 163.984 inactive 6.386 buffers 2.040 cached 86.755 shared 0.028 slab 5.367 available_virtual_memory 68.107 2024/09/30 15:03:52.101 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 44.400 total 283.290 available 161.248 percent 43.1 used 119.632 free 74.858 active 195.619 inactive 6.391 buffers 2.040 cached 86.760 shared 0.028 slab 5.368 available_virtual_memory 56.920 2024/09/30 15:04:02.108 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.500 total 283.290 available 155.949 percent 45.0 used 124.931 free 69.554 active 200.900 inactive 6.396 buffers 2.040 cached 86.765 shared 0.028 slab 5.369 available_virtual_memory 55.049 2024/09/30 15:04:12.116 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.400 total 283.290 available 150.179 percent 47.0 used 130.701 free 63.779 active 206.665 inactive 6.400 buffers 2.040 cached 86.770 shared 0.028 slab 5.369 available_virtual_memory 53.012 2024/09/30 15:04:22.124 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.400 total 283.290 available 144.853 percent 48.9 used 136.027 free 58.448 active 211.974 inactive 6.405 buffers 2.040 cached 86.775 shared 0.028 slab 5.369 available_virtual_memory 51.133 2024/09/30 15:04:32.135 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.500 total 283.290 available 139.952 percent 50.6 used 140.929 free 53.542 active 216.870 inactive 6.409 buffers 2.040 cached 86.779 shared 0.028 slab 5.369 available_virtual_memory 49.402 2024/09/30 15:04:42.144 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 29.300 total 283.290 available 135.010 percent 52.3 used 145.869 free 48.596 active 221.795 inactive 6.414 buffers 2.040 cached 86.785 shared 0.028 slab 5.370 available_virtual_memory 47.658 2024/09/30 15:04:52.156 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 40.600 total 283.290 available 136.097 percent 52.0 used 144.783 free 49.678 active 220.683 inactive 6.419 buffers 2.040 cached 86.790 shared 0.028 slab 5.371 available_virtual_memory 48.041 2024/09/30 15:05:02.168 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 37.700 total 283.290 available 133.251 percent 53.0 used 147.629 free 46.826 active 223.525 inactive 6.424 buffers 2.040 cached 86.795 shared 0.028 slab 5.372 available_virtual_memory 47.037 2024/09/30 15:05:12.176 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 89.000 total 283.290 available 149.217 percent 47.3 used 131.664 free 62.787 active 207.548 inactive 6.429 buffers 2.040 cached 86.799 shared 0.028 slab 5.373 available_virtual_memory 52.674 2024/09/30 15:05:22.187 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 32.600 total 283.290 available 148.834 percent 47.5 used 132.046 free 62.399 active 207.931 inactive 6.434 buffers 2.040 cached 86.804 shared 0.028 slab 5.374 available_virtual_memory 52.537 2024/09/30 15:05:32.196 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 54.400 total 283.290 available 150.256 percent 47.0 used 130.624 free 63.816 active 206.511 inactive 6.439 buffers 2.040 cached 86.810 shared 0.028 slab 5.375 available_virtual_memory 53.040 2024/09/30 15:05:42.205 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.000 total 283.290 available 169.188 percent 40.3 used 111.692 free 82.744 active 187.592 inactive 6.443 buffers 2.040 cached 86.814 shared 0.028 slab 5.375 available_virtual_memory 59.724 2024/09/30 15:05:52.216 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 64.200 total 283.290 available 155.908 percent 45.0 used 124.973 free 69.457 active 200.870 inactive 6.448 buffers 2.040 cached 86.820 shared 0.028 slab 5.376 available_virtual_memory 55.034 2024/09/30 15:06:02.224 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 29.000 total 283.290 available 138.815 percent 51.0 used 142.065 free 52.365 active 217.930 inactive 6.449 buffers 2.040 cached 86.820 shared 0.028 slab 5.376 available_virtual_memory 49.001 2024/09/30 15:06:12.232 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 22.900 total 283.290 available 132.824 percent 53.1 used 148.056 free 46.373 active 223.928 inactive 6.449 buffers 2.040 cached 86.821 shared 0.028 slab 5.376 available_virtual_memory 46.886 2024/09/30 15:06:22.240 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 22.700 total 283.290 available 126.778 percent 55.2 used 154.102 free 40.327 active 229.964 inactive 6.450 buffers 2.040 cached 86.821 shared 0.028 slab 5.376 available_virtual_memory 44.752 2024/09/30 15:06:32.251 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 29.400 total 283.290 available 130.444 percent 54.0 used 150.436 free 43.992 active 226.291 inactive 6.451 buffers 2.040 cached 86.822 shared 0.028 slab 5.376 available_virtual_memory 46.046 2024/09/30 15:06:42.260 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 34.000 total 283.290 available 135.483 percent 52.2 used 145.397 free 49.030 active 221.250 inactive 6.451 buffers 2.040 cached 86.822 shared 0.028 slab 5.376 available_virtual_memory 47.824 2024/09/30 15:06:52.272 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 28.400 total 283.290 available 144.153 percent 49.1 used 136.727 free 57.699 active 212.572 inactive 6.452 buffers 2.040 cached 86.824 shared 0.028 slab 5.376 available_virtual_memory 50.885 2024/09/30 15:07:02.280 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.700 total 283.290 available 144.350 percent 49.0 used 136.531 free 57.895 active 212.398 inactive 6.453 buffers 2.040 cached 86.824 shared 0.028 slab 5.376 available_virtual_memory 50.955 2024/09/30 15:07:12.292 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 79.300 total 283.290 available 144.145 percent 49.1 used 136.736 free 57.690 active 212.607 inactive 6.453 buffers 2.040 cached 86.824 shared 0.028 slab 5.375 available_virtual_memory 50.882 2024/09/30 15:07:22.301 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 78.700 total 283.290 available 144.009 percent 49.2 used 136.872 free 57.554 active 212.742 inactive 6.453 buffers 2.040 cached 86.824 shared 0.028 slab 5.375 available_virtual_memory 50.834 2024/09/30 15:07:32.313 [INFO] [__main__] cpu_memory.check_cpu_memory_usage(21): cpu_percentage 78.600 total 283.290 available 144.003 percent 49.2 used 136.877 free 57.549 active 212.752 inactive 6.453 buffers 2.040 cached 86.824 shared 0.028 slab 5.375 available_virtual_memory 50.832
This log file clearly shows that memory usage never utilizes the entire RAM. I also attempted to install Birdwatcher to examine the segment information.
what error did you get? is there any pod restart? any error message here?
I would suggest to upgrade to 2.4.12 and retry
what error did you get?
I forgot to include the error log: here is the error log from that time.
2024/09/30 15:03:12.264 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:03:22.272 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:03:32.284 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:03:42.296 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:03:52.308 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:04:02.320 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:04:12.336 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:04:22.349 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:04:32.364 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:04:42.379 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:04:52.395 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:05:02.409 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:05:12.426 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:05:22.441 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:05:32.472 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:05:42.486 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:05:52.501 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:06:02.518 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:06:12.532 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:06:22.547 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:06:32.561 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
2024/09/30 15:06:42.577 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
- what is nq, topk and concurrency?
I am looking for various nq
values, which could be one from the set [1, 2, 4, 8, 16, 32, 64]
with a topk
value of 50
for all searches. I don't have any knowledge about the concurrency
value, but a search is initiated each time.
- Did you calcuclate how much memory it needs?
I utilized Milvus Sizing Tool to calculate the necessary RAM and hard disk space, and it appears that I have sufficient memory and storage.
- if there any pod down, can you check there are some etcd session timeout log?
I don't utilize clusters or Kubernetes, and I think that pods are not applicable to my situation.
However, I'm attempting to check segment size using Birdwatcher, but it was unable to connect. I've also created an issue regarding this failed to connect to etcd: context deadline exceeded.
what error did you get?
I forgot to include the error log: here is the error log from that time.
2024/09/30 15:03:12.264 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:03:22.272 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:03:32.284 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:03:42.296 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:03:52.308 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:04:02.320 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:04:12.336 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:04:22.349 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:04:32.364 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:04:42.379 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:04:52.395 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:05:02.409 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:05:12.426 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:05:22.441 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:05:32.472 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:05:42.486 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:05:52.501 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:06:02.518 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:06:12.532 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:06:22.547 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:06:32.561 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])> 2024/09/30 15:06:42.577 [INFO] [sqlite_db] insert_data_into_partitions_key.collection_search(693): Error: <MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=452785967535053061])>
- what is nq, topk and concurrency?
I am looking for various
nq
values, which could be one from the set[1, 2, 4, 8, 16, 32, 64]
with atopk
value of50
for all searches. I don't have any knowledge about theconcurrency
value, but a search is initiated each time.
- Did you calcuclate how much memory it needs?
I utilized Milvus Sizing Tool to calculate the necessary RAM and hard disk space, and it appears that I have sufficient memory and storage.
- if there any pod down, can you check there are some etcd session timeout log?
I don't utilize clusters or Kubernetes, and I think that pods are not applicable to my situation.
However, I'm attempting to check segment size using Birdwatcher, but it was unable to connect. I've also created an issue regarding this failed to connect to etcd: context deadline exceeded.
So you are running on a standalone node with 283.29 GB memory?
is it a docker deployment or a docker compose deployment?
If it's docker compose you should be able to connect with birdwatcher and see segment info.
It's weird that your memory usage is 150GB GB(From the log you offered) but the estimation is more than 300GB.
Here is some guess of what happened:
If you can list S3 directory then we can probably do some further analysis.
My recommendation is:
So you are running on a standalone node with 283.29 GB memory?
Yes.
is it a docker deployment or a docker compose deployment?
I am using this docker compose file.
If it's docker compose you should be able to connect with birdwatcher and see segment info.
But I could not connect in birwatcher.
Your collection has many deletes
I don't perform any delete operations on the Milvus collection; I only add data to it at the moment.
Your segment is huge (more than 50GB)
I have no information on that. I was unable to check the segment size using Birdwatcher or obtain query segment information via get_query_segment_info()
function.
If you can list S3 directory then we can probably do some further analysis.
Where can I find the S3
directory? I couldn't locate it.
upgrade to 2.4.12
Currently, I am using this version.
It's weird that your memory usage is 150GB GB(From the log you offered) but the estimation is more than 300GB.
It's strange. I have 10 files, each containing 5 million rows with 512 dimensions. When I try to add them all at once, I run into issues. However, if I add them one by one, they can be successfully added to the collection. After adding each set of 5 million records, I perform indexing before attempting to add the next one.
g before attempting to add the next one.
So you are using import and try to import 10 files, each with 5million 512 dim data?
We need segment distribution info, probably the reason you can not connect to etcd is you need to export 2379 port of etcd to your host machine or your laptop.
So you are using import and try to import 10 files, each with 5million 512 dim data?
Thats right.
We need segment distribution info, probably the reason you can not connect to etcd is you need to export 2379 port of etcd to your host machine or your laptop.
let me chack it.
we need more evidence on how large is segment, even if you are trying to import multiple large files milvus should split all the files into small segments
"When I try to add them all at once, I run into issues. However, if I add them one by one, they can be successfully added to the collection" Another guess might be since you import too much file at the same time, index build can not catch up. And we you try to load multiple segment without index, the resource estimation maybe higher than actual usage.
try to change queryNode.segcore.interimIndex.enableIndex = false see if it helps
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
I added 50 million vectors to a collection. Subsequently, I wanted to conduct a benchmark to evaluate various consistency levels and different batch sizes of vectors to assess the improvements gained from using a partition key. The vector batch sizes chosen were [1, 2, 4, 8, 16, 32, 64].
However, after some time (duration unknown), I encountered this error:
Expected Behavior
No response
Steps To Reproduce