I created a 2.7 GB corpus file for Turkish. But it seems text2ngram can't handle such a big file. Can some optimizations be made in the program to work in large files?
On my system [1] second iteration can't finish:
for i in 1 2 3; do text2ngram -n $i -l -f sqlite -o database_aa.db mytext.filtered; done
By the way, thanks for the open source alternative to XT9 and good documentation on how to use it :) I already start test it with a small corpus [2].
I created a 2.7 GB corpus file for Turkish. But it seems text2ngram can't handle such a big file. Can some optimizations be made in the program to work in large files?
On my system [1] second iteration can't finish:
for i in 1 2 3; do text2ngram -n $i -l -f sqlite -o database_aa.db mytext.filtered; done
By the way, thanks for the open source alternative to XT9 and good documentation on how to use it :) I already start test it with a small corpus [2].
[1] 5950HQ + 16 GB RAM [2] https://pbs.twimg.com/media/DY_ftChXUAAQP3t.jpg:large