Plans to add BLEURT metric?

stanfordnlp / string2string

String-to-String Algorithms for Natural Language Processing

MIT License

533 stars 27 forks source link

Plans to add BLEURT metric? #2

Closed ogencoglu closed 1 year ago

ogencoglu commented 1 year ago

Would be a great addition.

Thibault Sellam, Dipanjan Das, and Ankur P. Parikh. 2020. Bleurt: Learning robust metrics for text generation.

ogencoglu commented 1 year ago

MoverScore too

Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, and Steffen Eger. 2019. MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance.

suzgunmirac commented 1 year ago

Hi @ogencoglu, thank you very much for your question.

Yes, we are actively working on incorporating wrappers for BLEURT, MoverScore, METEOR, and various other automatic string-based evaluation metrics in the near future. Please stay tuned for further updates!

PS. We also warmly welcome our community members to contribute and expand the library based on their expertise and interests; so, please feel free to make your own valuable additions.

ogencoglu commented 1 year ago

Thank you for the swift reply @suzgunmirac ! Appreciated!

ogencoglu commented 1 year ago

Continuing the metrics discussion.

I wonder whether descriptive metrics would enhance this library. For instance something like textdescriptives library. These metrics are not necessarily string2string, they are stuff like readability, coherence, perplexity etc. calculated from a single doc. But a readability vector can be calculated and used as a string2string metric I guess.

Example use case would be semantic search with faiss or any approximate nearest neighbor that returns similar docs and then returned docs can be re-ranked with respect to ease of readability or with respect to similarity of query doc's readability.

This is not a suggestion but just an idea that I wanted to document here, in case you find it relevant.

suzgunmirac commented 1 year ago

Thank you very much for suggesting the use of descriptive metrics in our library, as well as sharing a reference to the TextDescriptives library! TextDescriptives seems like a wonderful resource! We are indeed planning to incorporate additional metrics and string measures such as perplexity into our metrics module in the near future.