segfault when processing larger text files

When processing text files greater than about 4,500 lines, markov-idris may segfault. The failure is deterministic — a given input file will either always segfault or never segfault, at least as far as I can tell.

The two probable culprits here are the Markov.Analyze.cleanup function (which removes extraneous punctuation from the source text) and Markov.Analyze.buildMarkovMap, which constructs the Markov map in memory. Reading the file as a string doesn't seem to be a problem, even for files much larger than 4,500 lines.

juxtin / markov-idris

segfault when processing larger text files #1