Open dowobeha opened 3 years ago
Do you have enough memory on your machine? Sometimes hundreds of GBs are necessary for compilation of large networks.
You could try to implement the priority union from first principles instead of using the built-in operation. Then minimization gets done in every step and is (sometimes) faster with smaller intermediate results, like so:
regex Grammar | [Guesser .o. ~Grammar.l] ;
Sometimes it's effective to also do some seemingly redundant factorizations in the calculation, as they can also yield smaller intermediate transducers, in particular for the composition step. In particular, I've had some success with this one when other priority unions took too long or ate up memory:
regex Grammar | [Guesser .o. ~[Grammar.l & Guesser.l]] ;
I would probably try this second one first.
Both should produce the same result as:
regex Grammar .p. Guesser ;
Thanks for the tips. I believe the machine I tried on has 128 GB of RAM.
If the earlier tip fails, sometimes the best way to solve this kind of memory issue is to break down the priority union in a piecewise fashion for different word lengths. Here's a macro I've used that breaks down the calculation separately for words of length <5,6,7,8,9,10,>10 and unions them all together. Of course, if your language has a different distribution of word lengths, you might want to adapt the below to suit it better.
def PiecewisePriority(Gram, Guess) [[Gram .o. ?^<5] | [Guess .o. ?^<5 .o. ~[?^<5 .o. Guess.l .o. Gram.l]]] |
[[Gram .o. ?^5] | [Guess .o. ?^5 .o. ~[?^5 .o. Guess.l .o. Gram.l]]] |
[[Gram .o. ?^6] | [Guess .o. ?^6 .o. ~[?^6 .o. Guess.l .o. Gram.l]]] |
[[Gram .o. ?^7] | [Guess .o. ?^7 .o. ~[?^7 .o. Guess.l .o. Gram.l]]] |
[[Gram .o. ?^8] | [Guess .o. ?^8 .o. ~[?^8 .o. Guess.l .o. Gram.l]]] |
[[Gram .o. ?^9] | [Guess .o. ?^9 .o. ~[?^9 .o. Guess.l .o. Gram.l]]] |
[[Gram .o. ?^10] | [Guess .o. ?^10 .o. ~[?^10 .o. Guess.l .o. Gram.l]]] |
[[Gram .o. ?^>10] | [Guess .o. ?^>10 .o. ~[?^>10 .o. Guess.l .o. Gram.l]]] ;
With that defined, you can then just issue:
regex PiecewisePriority(FullLexicalToSurfaceGrammar, GuessToSurfaceGrammar);
The following causes a segfault. Note that in this case FullLexicalToSurfaceGrammar is rather large, so I suspect that a memory issue may be implicated.