mikhaildubov / AST-text-analysis

Statistical Natural Language Processing with Annotated Suffix Trees
http://cs.hse.ru/vitext/
MIT License
22 stars 10 forks source link

How to store/get an id for a string? #2

Open pombredanne opened 8 years ago

pombredanne commented 8 years ago

@mikhaildubov I am just starting to play with this code. How would I be able to store some ID for a string in the AST?

mikhaildubov commented 8 years ago

@pombredanne Sorry, I missed your comment. Could you please elaborate a bit? Why would you need to store IDs for your strings? The operation supported so far by this implementation is the computation of the score of relevance of a keyphrase to the AST built for a set of strings. So this relevance score is computed with respect to the whole set of strings (texts); you cannot compute the relevance of your keyphrase to some particular string in the set unless you build an AST for it separately.

Please note that I am going to re-work this code a bit in the neareast future and probably add some new functionality as well. So you are welcome to make any feature requests if you have them!

pombredanne commented 8 years ago

@mikhaildubov I was mostly interested in your generalized suffix tree construction to evaluate that for multiple pattern search. And not so much by the scoring for now. Now the id of a string can be seen as the unique terminator added to each string https://github.com/mikhaildubov/AST-text-analysis/blob/2b8eff7e430f32fd87408401012eb315b767a2ba/east/asts/utils.py#L25