There are functions and structures which should be optimized, if possible. Profiling shows that the most time consuming functions are related to the HashMap implementation:
Surprisingly, lookup and lookupUnsafe (which are evaluated almost the same number of times) need the same amount of time and memory. Nevertheless, if we substitute lookup for lookupUnsafe in the library code, number of (==) evaluations with respect to HashMap keys (automaton states) is greatly increased, so it seems that workload of HashMap functions is primarily related to the HashMap structure itself.
Other functions, which summarily consume a lot of resources, are from the Graph module:
Another optimization which comes to mind is to change the algorithm of (path, value) insertion. We can take the number of ingoing edges into account, i.e. there's no need to clone states with only one in-transition (root state, for example), we can just modify them. But if we modify the set of transitions outgoing from the particular state, hash of the state will also change (unless we fundamentally change the method of hash computation), so perhaps its better to leave this idea for another time.
Structures
There are functions and structures which should be optimized, if possible. Profiling shows that the most time consuming functions are related to the HashMap implementation:
COST CENTRE MODULE %time %alloc
insertUnsafe Data.DAWG.HashMap 7.8 10.0 deleteUnsafe Data.DAWG.HashMap 7.5 8.0 lookup Data.DAWG.HashMap 3.7 0.2 lookupUnsafe Data.DAWG.HashMap 3.7 0.3
Surprisingly, lookup and lookupUnsafe (which are evaluated almost the same number of times) need the same amount of time and memory. Nevertheless, if we substitute lookup for lookupUnsafe in the library code, number of (==) evaluations with respect to HashMap keys (automaton states) is greatly increased, so it seems that workload of HashMap functions is primarily related to the HashMap structure itself.
Other functions, which summarily consume a lot of resources, are from the Graph module:
decIngo Data.DAWG.Graph 5.9 11.3 remNode.nodeMap' Data.DAWG.Graph 4.8 6.0 remNode.ingoMap' Data.DAWG.Graph 4.0 6.0 newNode.nodeMap' Data.DAWG.Graph 3.6 6.9 newNode.ingoMap' Data.DAWG.Graph 3.2 6.9 newNode.(...) Data.DAWG.Graph 3.0 2.7 decIngo.k Data.DAWG.Graph 2.7 0.5 nodeBy Data.DAWG.Graph 2.6 0.3 newNode Data.DAWG.Graph 2.5 3.0 ...
Algorithms
Another optimization which comes to mind is to change the algorithm of (path, value) insertion. We can take the number of ingoing edges into account, i.e. there's no need to clone states with only one in-transition (root state, for example), we can just modify them. But if we modify the set of transitions outgoing from the particular state, hash of the state will also change (unless we fundamentally change the method of hash computation), so perhaps its better to leave this idea for another time.