uwdata / errudite

An Interactive Tool for Scalable and Reproducible Error Analysis.
https://errudite.readthedocs.io/en/latest/index.html
GNU General Public License v2.0
105 stars 11 forks source link

Errudite For Hate Speech Detection #10

Open RobertHalwass opened 4 years ago

RobertHalwass commented 4 years ago

Me and a couple of other master degree students are working on a project to use Errudite for the improvement of an hate speech classifier. After reviewing the code we assume our first step is, that since there is only a QA and VQA detector that we need to register our own. After that we want to customise the dataset reader and predictor. We also want to modify the UI so that values like "EM" and "Sent", which aren't applicable to hate speech detection, are no longer displayed. As of now we assume that this requires a modification of the Errudite server about how the requests and responses are handled. Are there steps you would recommend us to get this done?

tongshuangwu commented 4 years ago

Hi there! You are right, if you want to modify the UI, you will also need to update the server. However, I wouldn't recommend modifying the UI, mainly because we didn't expect the frontend code to be customized -- It was meant to just work for QA and VQA. As such, that part of code is not as documented as the backend code. To modify these codes, there will be quite some additional setup steps for npm, typescript, etc., and some more code to read and understand -- which could be overwhelming.

Instead, I would suggest sticking to the notebook version of Errudite -- It has most of the functions wrapped, except for query command auto-completion. You are right, you will need to build your hate speech detection classifiers, and dataset readers. The best way to learn it is to go through the notebook tutorial.

The three tutorials explain the end-to-end procedure of extending Errudite for Natural Language Inference (NLI); I believe you will go through very similar steps for hate speech detection, as they are both text classification tasks (only that NLI has one more input sentence target.)

If you run into any issues, please let me know! I'm also happy to work closely with you to make it work.