Closed damhiya closed 3 years ago
I just removed restrictKeys
from the code as its use causes a dependency problem while the performance gain is not that much.
Time complexity of restrictKeys
is O(n+m)
while time complexity of repeated use of findWithDefault
is O(min(n,W)*m)
.
In Hopcroft's algorithm, n is the number of all states and m is the number of states in each equivalence class.
Hi @damhiya. I am sorry I missed this until now. I am hoping the docs can be further clarified (a fault of the situation before, to be clear), and maybe a few other things, and then this should be ready to go.
BTW checkout https://github.com/haskell/containers/pull/616 which I need finish at some point. This filtering the empty sets indeed comes up a lot, and I hope to eventually be able to use that here.
Notice1. While the original Hopcroft's Algorithm has only one kinds of accepting states, alex's DFA allows multiple kinds of accepting states since alex is a lexer. However, there is no documentation about this difference in the alex repository. So I didn't really get about correctness of Hopcroft's algorithm that has multiple kinds of accepting states.
Notice2.
I expected to dividing P
into Q
and R
will make significant performance improvement, but I could not see any difference in test. Indeed, the changes excepting filtering null cases from xs
has almost no effect to performance. I only tested GHC's lexer, so it might require other test cases to observe the effect of new implementation.
The documented time complexity bound is almost certainly not tight. I would bet dollars to doughnuts that restrictKeys is not worse for big O.
On Sat, Jan 30, 2021, 9:45 PM damhiya notifications@github.com wrote:
@damhiya commented on this pull request.
In src/DFAMin.hs https://github.com/simonmar/alex/pull/176#discussion_r567354641:
- xs = filter (not . IS.null)
- . map (\m -> IS.unions [IM.findWithDefault IS.empty s m | s <- IS.toList a])
- $ IM.elems bigmap
Let me describe why I discouraged to use restrictKeys
Time complexity of restrictKeys is O(n+m) while time complexity of repeated use of findWithDefault is O(min(n,W)*m). In Hopcroft's algorithm, n is the number of all states and m is the number of states in each equivalence class.
n is number of all states that is pretty big. m is the number of states in each equivalence class. That means, in the end of the algorithm, m might be 1 or small number if the input DFA was already almost optimized. Then the effectual time complexity would be O(n) for restrictKeys and O(W*m) for findWithDefault. So, restrictKeys might be slower then the repeated findWithDefault.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/simonmar/alex/pull/176#discussion_r567354641, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7N24MFHMU3MLZXU7JDS4S75LANCNFSM4V3N6BRA .
IM.restrictKeys x (IS.singleton y)
have to work in O(min(n,W))
to beats competing implementation in every possible inputs.
I'm not sure that's the case.
The documented time complexity bound is almost certainly not tight. I would bet dollars to doughnuts that restrictKeys is not worse for big O. … On Sat, Jan 30, 2021, 9:45 PM damhiya @.> wrote: @*.** commented on this pull request. ------------------------------ In src/DFAMin.hs <#176 (comment)>: > + xs = filter (not . IS.null) + . map (\m -> IS.unions [IM.findWithDefault IS.empty s m | s <- IS.toList a]) + $ IM.elems bigmap Let me describe why I discouraged to use restrictKeys Time complexity of restrictKeys is O(n+m) while time complexity of repeated use of findWithDefault is O(min(n,W)m). In Hopcroft's algorithm, n is the number of all states and m is the number of states in each equivalence class. n is number of all states that is pretty big. m is the number of states in each equivalence class. That means, in the end of the algorithm, m might be 1 or small number if the input DFA was already almost optimized. Then the effectual time complexity would be O(n) for restrictKeys and O(Wm) for findWithDefault. So, restrictKeys might be slower then the repeated findWithDefault. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#176 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7N24MFHMU3MLZXU7JDS4S75LANCNFSM4V3N6BRA .
I'll check. If it doesn't, that can and will be fixed.
On Sat, Jan 30, 2021, 10:10 PM damhiya notifications@github.com wrote:
IM.restrictKeys x (IS.singleton y) have to work in O(min(n,W)) to beats competing implementation in every possible inputs. I'm not sure that's the case.
The documented time complexity bound is almost certainly not tight. I would bet dollars to doughnuts that restrictKeys is not worse for big O. … <#m-2272180050457183973> On Sat, Jan 30, 2021, 9:45 PM damhiya @.*> wrote: @.* commented on this pull request. ------------------------------ In src/DFAMin.hs <#176 (comment) https://github.com/simonmar/alex/pull/176#discussion_r567354641>: > + xs = filter (not . IS.null) + . map (\m -> IS.unions [IM.findWithDefault IS.empty s m | s <- IS.toList a]) + $ IM.elems bigmap Let me describe why I discouraged to use restrictKeys Time complexity of restrictKeys is O(n+m) while time complexity of repeated use of findWithDefault is O(min(n,W)m). In Hopcroft's algorithm, n is the number of all states and m is the number of states in each equivalence class. n is number of all states that is pretty big. m is the number of states in each equivalence class. That means, in the end of the algorithm, m might be 1 or small number if the input DFA was already almost optimized. Then the effectual time complexity would be O(n) for restrictKeys and O(W*m) for findWithDefault. So, restrictKeys might be slower then the repeated findWithDefault. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#176 (comment) https://github.com/simonmar/alex/pull/176#discussion_r567354641>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7N24MFHMU3MLZXU7JDS4S75LANCNFSM4V3N6BRA .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/simonmar/alex/pull/176#issuecomment-770317861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7LYPVTYVS3QAW63VNDS4TC4FANCNFSM4V3N6BRA .
I believe that documentation is wrong too. Great! The real bounds are unfortunately not very beautiful or simple, but in practice restrictKeys should be better.
On Sat, Jan 30, 2021, 10:40 PM John Ericson notifications@github.com wrote:
@Ericson2314 commented on this pull request.
In src/DFAMin.hs https://github.com/simonmar/alex/pull/176#discussion_r567359496:
- xs = filter (not . IS.null)
- . map (\m -> IS.unions [IM.findWithDefault IS.empty s m | s <- IS.toList a])
- $ IM.elems bigmap
I don't think that's right? Haddock says restrictKeys is O(m*log(n/m + 1)), m <= n, and findWithDefault is O(log n). So I that means O(log n) vs O(n log n) when n >> m. Intuitively this makes sense. With the repeated lookups, we have to traverse through the same intermediate nodes again and again. But with restrictKeys, we do everything in one pass.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/simonmar/alex/pull/176#discussion_r567359496, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7PXUTYXH6L2CF3O6C3S4TGJLANCNFSM4V3N6BRA .
Oh I replied before seeing the out-of-thread conversation, but yes I agree with @treeowl, see https://github.com/simonmar/alex/pull/176#discussion_r567359496. I think It boils down to repeated lookups intermediate nodes too many times, but restrict keys just doing once (well yes, returns to information from them using stack, but that has great locality).
Right, but the IntMap one is better than those bounds would suggest.
On Sat, Jan 30, 2021, 10:57 PM damhiya notifications@github.com wrote:
@damhiya commented on this pull request.
In src/DFAMin.hs https://github.com/simonmar/alex/pull/176#discussion_r567361072:
- xs = filter (not . IS.null)
- . map (\m -> IS.unions [IM.findWithDefault IS.empty s m | s <- IS.toList a])
- $ IM.elems bigmap
@Ericson2314 https://github.com/Ericson2314 You have to see IntMap.restrictKeys, not Map.restrictKeys. They have different big-O
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/simonmar/alex/pull/176#discussion_r567361072, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7JD7WOYJP6SOENC2A3S4TILPANCNFSM4V3N6BRA .
Yes, there's definitely a shortcut in the code when either map is a singleton, and in fact that applies recursively too.
On Sat, Jan 30, 2021, 10:10 PM damhiya notifications@github.com wrote:
IM.restrictKeys x (IS.singleton y) have to work in O(min(n,W)) to beats competing implementation in every possible inputs. I'm not sure that's the case.
The documented time complexity bound is almost certainly not tight. I would bet dollars to doughnuts that restrictKeys is not worse for big O. … <#m-2272180050457183973> On Sat, Jan 30, 2021, 9:45 PM damhiya @.*> wrote: @.* commented on this pull request. ------------------------------ In src/DFAMin.hs <#176 (comment) https://github.com/simonmar/alex/pull/176#discussion_r567354641>: > + xs = filter (not . IS.null) + . map (\m -> IS.unions [IM.findWithDefault IS.empty s m | s <- IS.toList a]) + $ IM.elems bigmap Let me describe why I discouraged to use restrictKeys Time complexity of restrictKeys is O(n+m) while time complexity of repeated use of findWithDefault is O(min(n,W)m). In Hopcroft's algorithm, n is the number of all states and m is the number of states in each equivalence class. n is number of all states that is pretty big. m is the number of states in each equivalence class. That means, in the end of the algorithm, m might be 1 or small number if the input DFA was already almost optimized. Then the effectual time complexity would be O(n) for restrictKeys and O(W*m) for findWithDefault. So, restrictKeys might be slower then the repeated findWithDefault. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#176 (comment) https://github.com/simonmar/alex/pull/176#discussion_r567354641>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7N24MFHMU3MLZXU7JDS4S75LANCNFSM4V3N6BRA .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/simonmar/alex/pull/176#issuecomment-770317861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7LYPVTYVS3QAW63VNDS4TC4FANCNFSM4V3N6BRA .
Oh I see. I was looking at the regular map bounds, oops.
Hmm, It seems travis ci does not work
Please be aware travis-ci.org will be shutting down in several weeks, with all accounts migrating to travis-ci.com. Please stay tuned here for more information.
Hmm, It seems travis ci does not work
For now it works, but is very slow. I would like to switch to something else but don't have the perms.
OK, looks like that's everything. Thanks again!
Hi This pr introduces some optimizations
IntSet
instead of[SNum]
while constructingbigmap
.IM.restrictKeys
for calculatingxs
.xs
. It seems most performance improvements came from this.R
andQ
instead ofP
andQ
. following pseudo code describes what it means.-- observation : Q is always subset of P -- let R = P \ Q. then following algorithm is the equivalent of the Hopcroft's Algorithm R := {{all nonaccepting states}}; Q := {{all accepting states}}; while (Q is not empty) do choose a set A from Q remove A from Q and add it to R for each c in ∑ do let X be the set of states for which a transition on c leads to a state in A for each set Y in R for which X ∩ Y is nonempty and Y \ X is nonempty do replace Y in R by the greater of the two sets X ∩ Y and X \ Y add the smaller of the two sets to Q end; for each set Y in Q for which X ∩ Y is nonempty and Y \ X is nonempty do replace Y in Q by the two sets X ∩ Y and Y \ X end; end; end;