oborchers / Fast_Sentence_Embeddings

Compute Sentence Embeddings Fast!
GNU General Public License v3.0
616 stars 83 forks source link

Add Features to Sentencevectors #57

Open oborchers opened 2 years ago

oborchers commented 2 years ago

[ ] Sentencevectors: Global: [ ] Remove normalized vector files and replace with NN ANN: --> (Annoy, with Option for Google ScANN?) [ ] Only construct index when when calling most_similar method [ ] Logging of index speed [ ] Save and load of index [ ] Assert that index and vectors are of equal size [ ] Paramters must be tunable afterwards [ ] Method to reconstruct index [ ] How does the index saving comply with SaveLoad? [ ] Write unittests? Brute: [ ] Keep access to default method [ ] Make ANN Search the default?! --> Results? [ ] Throw warning for large datasets for vector norm init [ ] Maybe throw warning if exceeds RAM size of the embedding + normalization Other: [ ] L2 Distance [ ] L1 Distance [ ] Correlation (Power Score Correlation?) [ ] Lookup-Functionality (via defaultdict) [ ] Get vector: Not really memory friendly [ ] Show which words are in vocabulary [ ] Asses empty vectors (via EPS sum) [ ] Z-Score Transformation from Power-Means Embedding? --> Benefit?