hdbeukel / james-core

Core module of the JAMES framework
Apache License 2.0
6 stars 5 forks source link

Optimize subset solution and neighbourhoods #30

Open hdbeukel opened 8 years ago

hdbeukel commented 8 years ago

Consider the following changes:

  1. Avoid separate storage of set with all IDs.

  2. Provide private inner class RandomRetrievalSet in SubsetSolution which combines a Map and ArrayList so that a random item can be picked in constant (instead of linear) time. The map maps contained items to the index at which they occur in the list and its key set serves as the modelled set. Items occur in the list in no particular order and the order may change when manipulating the collection. The map and list are kept in sync upon modifications. Constant time removals are preserved by swapping the removed and last item in the list after which the last item is removed. This avoids shifting the remaining items and is allowed as no order is to be maintained in the list (the list purely serves as a mechanism for fast random sampling). Use this utility class to store the selected/unselected IDs and add methods to SubsetSolution for picking a random ID from one of these two groups (in constant time).

  3. Update neighbourhoods to benefit from the constant time random selection of IDs to add/remove/swap. Can only be used if no IDs are fixed in the set of selected/unselected IDs.

  4. Neighbourhoods: avoid copy of selected/unselected IDs if there are fixed IDs but one of both sets does not contain any fixed IDs while only the other one does. If a copy is needed, consider creating an ArrayList instead of Set so that a random non-fixed ID can then be selected in constant time (after the copy).

  5. Generalize SetUtilities to deal with any kind of collection. Retain but deprecate SetUtilities by delegating to a more general utility class CollectionUtils. Provide fast implementations for random access lists (implementing both List and RandomAccess) in addition to the slower, general fallback mechanism for other collections as currently implemented in SetUtilities. These extended utilities can be used in the subset neighbourhoods.

hdbeukel commented 8 years ago

Document the fact that specifying fixed IDs in the updated SingleSwapNeighbourhood and SinglePerturbationNeighbourhood will increase the complexity for generated random moves from O(1) to O(n) in the worst case, where n is the total number of items from which a subset is selected.