Open A7med7x7 opened 2 months ago
They usually are derived from news API. I don't have any influence over the data itself for Arabic
It'd be easier and faster for us to use more varied sources of data for Stanza, but we can always redo either the Stanza or the CoreNLP models if you have more data available
@AngledLuffa Yes, you definitely don't have control but at least providing general data that is not biased towards one behaviour will help a lot, as all the samples I have seem to be this type. data is everywhere what's more important is to check it's validity and why it intuitively means, and I can help wtih that.
why the arabic processing data seems to be violence and records of criminal activity, like it's derived from a news API?