machinelearningZH / simply-simplify-language

Use machine learning to make your institutional communication more understandable and inclusive.
MIT License
40 stars 6 forks source link
anthropic einfachesprache leichtesprache llm llms mistral mistralai natural-language-processing nlp openai plainlanguage python spacy streamlit

Simply simplify language

Use LLMs to simplify your institutional communication. Get rid of «Behördendeutsch».

GitHub License PyPI - Python GitHub Stars GitHub Issues GitHub Issues Current Version linting - Ruff

Contents - [Usage](#usage) - [Project information](#project-information) - [What does the app do?](#what-does-the-app-do) - [What does it cost?](#what-does-it-cost) - [Our language guidelines](#our-language-guidelines) - [A couple of findings](#a-couple-of-findings) - [How does the understandability score work?](#how-does-the-understandability-score-work) - [What does the score mean?](#what-does-the-score-mean) - [Outlook](#outlook) - [Project team](#project-team) - [Contributing](#feedback-and-contributing) - [Miscellaneous](#miscellaneous)

Usage

Run the app locally

Run the app in the cloud

Run the app in a Github Codespace

[!Note] The app logs user interactions to your local computer or virtual machine to a file named app.log. If you do not want to have analytics, simply comment out the function call in the code.

Project information

Institutional communication is often overly complicated and hard to understand. This particularly affects citizens who do not speak German as their first language or who struggle with complex texts for other reasons. Clear and simple communication is essential to ensure everyone can participate in public processes and access services equally.

For many years, the cantonal administration of Zurich has gone to great lengths to make communication more inclusive and accessible. With the increasing volume of content, we wanted to explore the potential of AI to assist in this effort. In autumn 2023, we launched a pilot project. This app is one of the results. The code in this repository represents a snapshot of our ongoing efforts.

We developed this app following our communication guidelines. However, we believe it can be easily adapted for use by other public institutions.

What does the app do?

In English «Einfache Sprache» is roughly equivalent to «Plain English, while «Leichte Sprache» has similarities to «Easy English».

[!Important] At the risk of stating the obvious: By using the app you send data to a third-party provider (OpenAI, Anthropic, and Mistral AI in case of the current state of the app). Therefore strictly only use non-sensitive data. Again, stating the obvious: LLMs make errors. They regularly hallucinate, make things up, and get things wrong. They often do so in subtle, non-obvious ways, that may be hard to detect. This app is meant to be used as an assistive system. It only yields a draft, that you always must double- and triple-check.

At the time of writing many users in our administration have extensively used the app with hundreds of texts over several months. The results are very promising. With the prototype app, our experts have saved time, improved their output, and made public communication more inclusive.

[!Note] This app is optimized for Swiss German («Swiss High German», not dialect). Some rules in the prompts steer the models toward this. Also the app is setup to use the Swiss ss rather than the German ß The understandability index assumes the Swiss ss for the common word scoring and we replace ß with ss in the results.

What does it cost?

Usage is inexpensive. You only pay OpenAI & Co. for the tokens that you use. E.g. for the translation of 100 separate «Normseiten» (standard pages of 250 German words each) to Einfache Sprache or Leichte Sprache you pay depending on the model token cost - so roughly between 0.5 CHF for Claude Haiku and a little over 25 CHF for Claude Opus (as of June 2024). The hardware requirements to run the app are modest too. As mentioned above a small VM for a couple of Francs per month will suffice.

Our language guidelines

You can find the current rules that are being prompted in utils_prompts.py. Have a look and change these according to your needs and organizational communication guidelines.

We derived the current rules in the prompts mainly from these of our language guidelines:

A couple of findings

How does the understandability score work?

We have published the ZIX understandability index as a pip installable package. You can find it here.

[!Note] The index is slightly adjusted to Swiss German. Specifically we use ss instead of ß in our vocabulary lists. In practice this should not make a big difference. For High German text that actually contains ß the index will likely underestimate the understandability slightly with a difference of around 0.1.

What does the score mean?

Outlook

These are a couple of areas that we are actively working on:

Project team

This project is a collaborative effort of these people of the cantonal administration of Zurich:

A special thanks goes to Government Councillor Jacqueline Fehr, who came up with the idea and initiated and supported the project.

Feedback and contributing

We are interested to hear from you. Please share your feedback and let us know how you use the app in your institution. You can write an email or share your ideas by opening an issue or a pull requests.

Please note that we use Ruff for linting and code formatting with default settings.

Miscellaneous