tattle-made / Uli

Software and Resources for Mitigating Online Gender Based Violence in India
https://uli.tattle.co.in
GNU General Public License v3.0
40 stars 29 forks source link

Provide offline support for approximate slur replacement #54

Closed dennyabrain closed 2 years ago

dennyabrain commented 2 years ago

@mlkorra has written python code that uses fuzzy search to detect approximate matches for slurs - https://github.com/tattle-made/OGBV/blob/main/slur-replacement/Slur%20Replacement%20-%20SCRIPT.ipynb

This code is in python and the way to integrate this functionality in our Chrome Plugin would be to expose this functionality via a REST API. This of course would add latency and network calls made from the extension.

I see a potential to incorporate this approximate slur replacement function into the extension code itself. This will make it so that the feature will be able to work without any internet.

The key to achieve this would be to use WebAssembly Web assembly support is built into all modern browsers now. There are compilers that will compile languages like python, go, C, c++ to webassembly. Allowing web apps to rely not just on javascript but these other languages too. This is a similar project for python - https://github.com/pyodide/pyodide

@mlkorra's code relies on python and 2 libraries - fuzzywuzzy, Levenstein. Levenstein's code is written in C. So I think theoretically it should be possible to compile his code to webassembly and bundle that with the chrome extension. The extension will communicate with the webassembly app to get the approximate slur replacement feature as opposed to a REST API hosted on cloud (associated issue).

dennyabrain commented 2 years ago

@mlkorra I have some doubts for you. Is there a reason you are using https://github.com/maxbachmann/Levenshtein and not https://github.com/ztane/python-Levenshtein. The Latter one seems more popular and also seems to have wheels(which are needed by pydiode to produce web assembly code)

If we can get identical results by using python-levenshtein, lets use that because that might make it easy to compile to webassembly.

You can also try reading this doc - https://pyodide.org/en/stable/index.html The Using Pyodide section should help.

mlkorra commented 2 years ago

@dennyabrain Levenstein uses the same code base from python-Levenstein,but the latter is currently not being maintained

dennyabrain commented 2 years ago

i see. @mlkorra can you confirm for me if replacing Levenstien with python-levenstein changes the result? Since this is the type of core library that does not require constant improvements, its possible that the unmaintained status is not a problem as long as the functionality is the same.

If we can use python-levenstein(which uses C and has wheels available for it), we might be able to compile it to web assembly and use in the browser easily.

mlkorra commented 2 years ago

@dennyabrain Alright, will work on that and let you know.