Open sebfisch opened 14 years ago
maybe stream-fusion can be used to improve efficiency simply by importing Data.List.Stream
instead of using Data.List
functions in the implementation of the matcher.
Using Data.List.Stream
did not improve performance when searching for :.{4000}:
in all Haskell files. Using Data.Stream
made performance worse.
Check whether the new implementation of the vector package speeds up our algorithm.
it may be worthwhile to repeat the tests with the new implementation and simple regexps to measure raw search speed.
Maybe it is a good idea to provide an interface to both strict and lazy bytestrings.
The current implementation uses lazy strings and, thus, requires only constant space: the input can be streamed. I expect the same behaviour from an implementation using lazy bytestrings. For small input (like lines of a text file) it may be faster to not support streaming and read the input into a strict bytestring instead.
It is also worth trying to replace lazy strings with enumerators.
have some of these ideas been experimented with as yet?
Apart from stream fusion, I also tried Data.Text without benefit. However, I did not profile to see whether things can be improved - only changed the implementation to use the different interface. All this was done before new versions of Data.Text and GHC came out.
Add support for more efficient representations of strings as provided by the text and/or bytestring packages.