joke2k / faker

Faker is a Python package that generates fake data for you.
https://faker.readthedocs.io
MIT License
17.66k stars 1.93k forks source link

Arabic Issues #1470

Closed prescod closed 2 years ago

prescod commented 3 years ago

Brief summary of the issue goes here.

Steps to reproduce

Issue 1. Egyptian Arabic has very few providers:

Faker("ar_EG").first_name() 'Johnny'

Issue 2: What is ar_AA ? There is no country code for "AA".

Questions:

Is ar_AA some kind of generic arabic locale?

Should ar_EG inherit from ar_AA?

Should direct instantiation of ar_AA be discouraged?

fcurella commented 3 years ago

What is ar_AA ? There is no country code for "AA". Is ar_AA some kind of generic arabic locale?

Correct. It was introduced in https://github.com/joke2k/faker/commit/420ae236d0c1eb63d278778e6e67de52ba77ab4f as a base class for Arabic locales

While there is no Country code for AA, ar_AA is a valid locale code:

import locale

locale.locale_alias["ar_aa"]
# 'ar_AA.ISO8859-6'

Should ar_EG inherit from ar_AA?

If it's appropriate for the provider (ie: how much code duplication it saves). I don't know enough Arabic languages and locales to be able to make that call.

Should direct instantiation of ar_AA be discouraged?

I don't think so. It is a valid value for a locale after all.

prescod commented 3 years ago

I'm not an Arabic speaker either, but its hard for me to imagine that this:

>>> from faker import Faker
>>> Faker("ar_EG").first_name()
'Erica'

Is better than whatever we'd get if we inherited from aa_AA.

Or to put it another way: if aa_AA is intended as a base-class for arabic languages, then every arabic language should inherit from it and either override the inappropriate stuff or remove that particular dataset from ar_AA altogether.

What do you think @ahmedaljazzar

fcurella commented 3 years ago

@prescod I would think so too. Ideally we'd have someone from Egypt to make a more informed decision than I ever could.

iamjazzar commented 3 years ago

@prescod It's been a long time since I've worked with Arabic locales, but here's what I think

Issue 1. Egyptian Arabic has very few providers: Faker("ar_EG").first_name() 'Johnny'

That's correct, I only wanted to introduce a minimum functionality that I needed and I left the rest for the community to contribute as needed.

Issue 2: What is ar_AA ? There is no country code for "AA". Is ar_AA some kind of generic arabic locale?

💯I've had the same question when I saw this first, but after some digging I reached out to the following conclusions:

So in this case aa_AA should represent MSA, and \b(?!aa_AA\b)aa_[A-Z]{2} should represent dialects.

Should ar_EG inherit from ar_AA?

I think this is a great idea, all Arabic dialects inherit from MSA, and I don't see why it should be any different here.

Should direct instantiation of ar_AA be discouraged?

As @fcurella said, and for the above reasons I mentioned, I don't think this is a good idea.

I speak fluent Arabic, and I'm happy to help getting Arabic locales in order here.

prescod commented 3 years ago

Thanks so much @ahmedaljazzar. I guess the central question is whether there is any "harm" in changing every aa_XX locale from inheriting from aa_AA so that at the very least the names etc. will tend to be Arabic and not English.

Should we just do that?

iamjazzar commented 3 years ago

@prescod I second that. I believe all aa_ locale should inherit from aa_AA. Some outliers like license plate number and phone numbers might not apply here, but in general this is a great idea.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 14 days since being marked as stale.