Closed ndvbd closed 5 years ago
That should never happen. I tested the library with ngrams not indexed by the data structure, e.g., when computing perplexity score from a text. Can you do more tests? Are you trying to read some queries from a file? You can text me at my e-mail in my profile if you prefer. Anyway, I will perform some sanity checks and verify that behaviour myself.
Also, try to recompile the code with -DCMAKE_BUILD_TYPE=Release and run the tests again. You could see this exception arising: https://github.com/jermp/tongrams/blob/master/vectors/sorted_array.hpp#L111
@jermp is it possible to return -1 when ngram is not found in data structure?
Yes, I will implement this. So far, the trie that stores the count assumes ngrams are always found.
Done. See also the test_data/queries.not_found for some examples of strings that must not be found after indexing the test_data.
I am trying to use Tongrams, and to eventually write a python wrapper for it. For now, I created an Eclipse CDT project from the cmake files using:
cmake -G "Eclipse CDT4 - Unix Makefiles" ./
I created the data structure (pef_trie) from the test set. Now when I try to lookup for an ngram which is not found I get a segmentation fault: