[FEATURE] Concurrency optimizations with native memory graph loading and force eviction

Is your feature request related to a problem?

With the introduction of Lucene compatible loading layer within NativeMemoryLoadStrategy - the IndexLoadStrategy.load() takes care of loading the graph file into the native memory cache using IndexInput
With force eviction, the synchronized block contains the logic for loading the entry into the cache as well as the memory
The synchronized nature of the block to deal with cache sizing causes a premature bottleneck with the memory load, especially in the case of concurrent segment search where multiple threads are forced to be synchronized for graph load operations

What solution would you like?

An ideal solution here would be to ensure that the preload to memory and any preliminary operations (for eg download segments in case of remote store or searchable snapshots or checksumming) can be performed outside of the synchronized block to allow for better parallelism
A suggested approach would look like adding in a new API ensureLoadable to the NativeMemoryEntryContext which will make sure that the graph file is accessible, and ready to be loaded into memory once space is available

What alternatives have you considered? N/A

Do you have any additional context? N/A

opensearch-project / k-NN