juxtin / markov-idris

A simple sentence generator based on Markov chains.
GNU General Public License v3.0
0 stars 0 forks source link

segfault when processing larger text files #1

Open juxtin opened 9 years ago

juxtin commented 9 years ago

When processing text files greater than about 4,500 lines, markov-idris may segfault. The failure is deterministic — a given input file will either always segfault or never segfault, at least as far as I can tell.

The two probable culprits here are the Markov.Analyze.cleanup function (which removes extraneous punctuation from the source text) and Markov.Analyze.buildMarkovMap, which constructs the Markov map in memory. Reading the file as a string doesn't seem to be a problem, even for files much larger than 4,500 lines.

juxtin commented 9 years ago

This gist contains a section of Pride and Prejudice that causes a segfault on my machine.

When I delete the last line, it suddenly works.