jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.04k stars 157 forks source link

metric_traversal generator #118

Closed dmevdok closed 5 years ago

dmevdok commented 5 years ago

algorithm metric_traversal first sorts sample of strings by the given key then yields the first string then yields the nearest string to the previous yielded via given metric, while the sample is not empty

motivation when doing text classification, clustering and so on, it's useful to prepare the data, trying to embed it to the least dimensional space. metric_traversal can be treated as dimension reduction algorithm, as it "projects" texts on the 1D space (with index as a coordinate), trying to put similar texts near to each other.

jamesturk commented 5 years ago

thanks but I don't think this will be appropriate for inclusion in Jellyfish