Open rcap107 opened 6 months ago
Imports in EmbDI are a mess, mostly because of the data preprocessing package. I should rewrite it so that they aren’t an issue anymore.
The problematic imports are similarity and datasketch.
similarity
datasketch
datasketch is available on conda, but not from the main repository. similarity instead is a random pip package with the levenshtein distance function. datasketch is used to work with a MinHASH encoder, which is also implemented by [dirtycat](https://dirty-cat.github.io/stable/generated/dirty_cat.MinHashEncoder.html#dirty_cat.MinHashEncoder), so maybe it should be reimplemented in that way.
levenshtein distance
Missing packages with sources:
Imports in EmbDI are a mess, mostly because of the data preprocessing package. I should rewrite it so that they aren’t an issue anymore.
The problematic imports are
similarity
anddatasketch
.datasketch
is available on conda, but not from the main repository.similarity
instead is a random pip package with thelevenshtein distance
function.datasketch
is used to work with a MinHASH encoder, which is also implemented by [dirtycat](https://dirty-cat.github.io/stable/generated/dirty_cat.MinHashEncoder.html#dirty_cat.MinHashEncoder), so maybe it should be reimplemented in that way.Missing packages with sources: