oscar-project / corpus

corpus issues.
Apache License 2.0
5 stars 0 forks source link

Wu Chinese dataset is of bad quality. #5

Open Uinelj opened 2 years ago

Uinelj commented 2 years ago

The Wu Chinese dataset is not in wu chinese. Its quality needs to be evaluated: is it another language, or is it completely gibberish?