joke2k / faker

Faker is a Python package that generates fake data for you.
https://faker.readthedocs.io
MIT License
17.59k stars 1.92k forks source link

New feature: language_name #1179

Closed ikhomutov closed 4 years ago

ikhomutov commented 4 years ago

Hi. I want to add a new method that would return random language name (e.g. English, Spanish, Dutch). I'm not sure where I should place this method. Initially I wanted to add this to BaseProvider near the language_code method, but this provider has no locale support. Any thoughts?

ex-nerd commented 4 years ago

This information is conveniently available, including percentages of native speakers, which could allow for semi-realistic generation of data: https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers

ex-nerd commented 4 years ago

Just for posterity, here's a quick language provider (in English names) with distribution based on real-world native speakers.

from collections import OrderedDict
from typing import List

from faker.providers import BaseProvider

# List of languages and numer of (millions) of people who speak them, according to
# https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers
# Faker requires this be an OrderedDict for random_elements to work.
_languages = OrderedDict(
    [
        ("Mandarin Chinese", 918,),
        ("Spanish", 480,),
        ("English", 379,),
        ("Hindi", 341,),
        ("Bengali", 228,),
        ("Portuguese", 221,),
        ("Russian", 154,),
        ("Japanese", 128,),
        ("Western Punjabi", 92,),
        ("Marathi", 83,),
        ("Telugu", 82,),
        ("Wu Chinese", 81,),
        ("Turkish", 79,),
        ("Korean", 77,),
        ("French", 77,),
        ("German", 76,),
        ("Vietnamese", 76,),
        ("Tamil", 75,),
        ("Yue Chinese", 73,),
        ("Urdu", 68,),
        ("Javanese", 68,),
        ("Italian", 64,),
        ("Egyptian Arabic", 64,),
        ("Gujarati", 56,),
        ("Iranian Persian", 52,),
        ("Bhojpuri", 52,),
        ("Min Nan Chinese", 50,),
        ("Hakka Chinese", 48,),
        ("Jin Chinese", 46,),
        ("Hausa", 43,),
        ("Kannada", 43,),
        ("Indonesian", 43,),
        ("Polish", 39,),
        ("Yoruba", 37,),
        ("Xiang Chinese", 37,),
        ("Malayalam", 37,),
        ("Odia", 34,),
        ("Maithili", 33,),
        ("Burmese", 32,),
        ("Eastern Punjabi", 32,),
        ("Sunda", 32,),
        ("Sudanese Arabic", 31,),
        ("Algerian Arabic", 29,),
        ("Moroccan Arabic", 27,),
        ("Ukrainian", 27,),
        ("Igbo", 27,),
        ("Northern Uzbek", 25,),
        ("Sindhi", 24,),
        ("North Levantine Arabic", 24,),
        ("Romanian", 24,),
        ("Tagalog", 23,),
        ("Dutch", 23,),
        ("Saʽidi Arabic", 22,),
        ("Gan Chinese", 22,),
        ("Amharic", 21,),
        ("Northern Pashto", 20,),
        ("Magahi", 20,),
        ("Thai", 20,),
        ("Saraiki", 20,),
        ("Khmer", 16,),
        ("Chhattisgarhi", 16,),
        ("Somali", 16,),
        ("Malay (Malaysian Malay)", 16,),
        ("Cebuano", 15,),
        ("Nepali", 15,),
        ("Mesopotamian Arabic", 15,),
        ("Assamese", 15,),
        ("Sinhalese", 15,),
        ("Northern Kurdish", 14,),
        ("Hejazi Arabic", 14,),
        ("Nigerian Fulfulde", 14,),
        ("Bavarian", 14,),
        ("South Azerbaijani", 13,),
        ("Greek", 13,),
        ("Chittagonian", 13,),
        ("Kazakh", 12,),
        ("Deccan", 12,),
        ("Hungarian", 12,),
        ("Kinyarwanda", 12,),
        ("Zulu", 12,),
        ("South Levantine Arabic", 11,),
        ("Tunisian Arabic", 11,),
        ("Sanaani Spoken Arabic", 11,),
        ("Min Bei Chinese", 11,),
        ("Southern Pashto", 10,),
        ("Rundi", 10,),
        ("Czech", 10,),
        ("Taʽizzi-Adeni Arabic", 10,),
        ("Uyghur", 10,),
        ("Min Dong Chinese", 10,),
        ("Sylheti", 10,),
    ]
)

class LanguageProvider(BaseProvider):
    def languages(self, length: int = None, unique: bool = None) -> List[str]:
        return self.random_elements(_languages, length, unique)

    def language(self) -> str:
        return self.languages(1)[0]
fcurella commented 4 years ago

If anybody wants to submit a PR, I'll be happy to review it!

ikhomutov commented 4 years ago

I do want to submit pr, but I'm still not sure where to put such generator. It should support localization so the BaseProvider is not an option. I'll probably put it into person provider if nobody comes up with a better idea

fcurella commented 4 years ago

I'm not sure either, and I don't have a better idea. Go with person 👍