ml5js / ml5-library

Friendly machine learning for the web! 🤖
https://ml5js.org
Other
6.45k stars 906 forks source link

word2vec example model includes slurs and other offensive language #1238

Open bomanimc opened 2 years ago

bomanimc commented 2 years ago

It's been brought to our attention that word2vec produced a racial slur in a student project. The issue comes from our the inclusion of slurs in the word2vec model files we use for the word2vec examples (see here: https://github.com/ml5js/ml5-library/tree/main/examples/p5js/Word2Vec/Word2Vec_Interactive/data). These example word2vec model files were merged into the ml5.js project years before many of us began working on the project, but we now have an awareness that these have the possibility to produce racial slurs! We need to quickly address the issue.

Overall, we're thinking about addressing the issue in two steps: 1) a short-term change to address the immediate harm posed by our word2vec model and then 2) a longer-term decision about how we want to change this model. We're opening a discussion here to consider the best way to approach each step, starting with the short-term change. Your input would be much appreciated!

bomanimc commented 2 years ago

Short-term Change Proposal: Release a new update to ml5.js that disables the word2vec function so that it no longer functions and instead prints a message to the console explaining that we've removed the model while revising it to address issues with slurs/hate speech. We could add a notice to the top of the word2vec documentation page explaining that we've temporarily and intentionally made the word2vec function stop working while we revise it to avoid these problematic issues. Later, once issues are resolved, we could consider adding word2vec back to the library.

Pro: This change more strictly reduces the potential of harm by making projects pointing to the latest release of ml5.js (ml5@latest) dysfunctional. This might, for example, make a project that uses our problematic word2vec model files (because the person, for example, started their project by copying our examples) stop working before it produces hate speech.

Con: This change is fairly abrupt and might unexpectedly cause people's important projects to error (for example, if someone has an installation running that's referencing the latest version of ml5.js and uses word2vec, it'd stop working unexpectedly).

ElvinD commented 2 years ago

Is the tool racist or are people racist...? Shame it's disabled!

c-dacanay commented 2 years ago

Update: Hi everyone! We decided to meet with Allison Parrish–poet, teacher and resident word vector expert–to discuss ways we might move forward with word2vec and ml5.js. There were a few wonderful insights:

Allison works with generated words extensively and stressed the importance of control over vocabulary, especially in live and educational contexts. She offered some solutions for how we might alter word2vec to make it safer for community use.

If we maintain word2vec:

If we depreciate word2vec, there are several models we might replace it with:

Ultimately, we left the conversation more excited about exploring new options for playing with language in ml5.js. But, we want to take some time to explore before committing to a change. From a pedagogical perspective, word2vec has been useful in helping students understand vector space and math. We’d like to investigate how teachers, students, and others have been using word2vec in their projects so that we can be sure another model can fulfill our communities needs.

That's the update so far, let us know if this sparks any thoughts.

bengrosser commented 2 years ago

Has there been any further thoughts or updates on this? I arrived looking to easily try some things with word2vec, so am interested in whether it's 100% abandoned in ml5js or if something else is coming or... Seems another possibility with word2vec might be to leave it in but require users to train it on their own text instead of coming pre-trained (meaning the output is ultimately limited by the input text chosen)?

fredowashere commented 1 year ago

Come on re-enable it guys!

paraclete-pizza commented 1 year ago

Is it possible to download an old version of ML5 to use locally, which includes Word2Vec, if we want to experiment with it using our own datasets? (Either responsibly expunged data, or data used in an academic context where surfacing problematic associations would reveal critical information about the source corpus for ultimately liberatory ends?)