sebfisch / haskell-regexp

Regular Expression Matching in Haskell
http://sebfisch.github.com/haskell-regexp/
Other
35 stars 5 forks source link

more efficient text representation #9

Open sebfisch opened 14 years ago

sebfisch commented 14 years ago

Add support for more efficient representations of strings as provided by the text and/or bytestring packages.

sebfisch commented 14 years ago

maybe stream-fusion can be used to improve efficiency simply by importing Data.List.Stream instead of using Data.List functions in the implementation of the matcher.

sebfisch commented 14 years ago

Using Data.List.Stream did not improve performance when searching for :.{4000}: in all Haskell files. Using Data.Stream made performance worse.

sebfisch commented 14 years ago

Check whether the new implementation of the vector package speeds up our algorithm.

sebfisch commented 14 years ago

it may be worthwhile to repeat the tests with the new implementation and simple regexps to measure raw search speed.

sebfisch commented 14 years ago

Maybe it is a good idea to provide an interface to both strict and lazy bytestrings.

The current implementation uses lazy strings and, thus, requires only constant space: the input can be streamed. I expect the same behaviour from an implementation using lazy bytestrings. For small input (like lines of a text file) it may be faster to not support streaming and read the input into a strict bytestring instead.

sebfisch commented 14 years ago

It is also worth trying to replace lazy strings with enumerators.

cartazio commented 13 years ago

have some of these ideas been experimented with as yet?

sebfisch commented 13 years ago

Apart from stream fusion, I also tried Data.Text without benefit. However, I did not profile to see whether things can be improved - only changed the implementation to use the different interface. All this was done before new versions of Data.Text and GHC came out.