unicef / kindly

GNU Affero General Public License v3.0
24 stars 17 forks source link

Track down License for twitter-roberta-base-offensive dataset #107

Open lacabra opened 2 years ago

lacabra commented 2 years ago

As documented, Kindly currently uses the twitter-roberta-base-offensive dataset, which is licensed without any restrictions, but refers to the original license for additional information.

This dataset is used in this study, therein referred as Offensive Language Identification Dataset (OLID) with its associated homepage belonging to Shervin Malmasi. It includes a download link, but it's a dead end.

Another reference to the same study, yields a code repository with the actual dataset, but is missing a license.

I am contacting both leads to request an explicit inclusion of an open license.

Cc: @amreenp7, @nathanfletcher

lacabra commented 2 years ago

Made a request for the inclusion of a LICENSE in joeykay9/offenseval#1