relatio-nlp / relatio

code base for constructing narrative statements from text
MIT License
98 stars 27 forks source link

Dimension reduction of embeddings before clustering #81

Closed shimkoji closed 2 years ago

shimkoji commented 2 years ago

Writing codes for issue #75.

Adding an option (weight_by_frequency) for allowing duplicate entities for dimension reduction and clustering.

If weight_by_frequency is true, dimension reduction and clustering are conducted with duplicate entities (clustering is weighted by each frequency). If weight_by_frequency is false, dimension reduction and clustering are conducted with unique entities.

I also added different PCA options: incremental PCA and weighted PCA