Redesign the Languages page

lbourdois commented 2 years ago

Hi,

An issue that can be seen as a continuation of issue #193. The topics covered are related but in two different parts of the site and may not be covered simultaneously, so I preferred to split it up and open a second issue.

Assuming that we have a better referencing of the datasets and models on the criterion of language thanks to the issue #193, it seems relevant to me to redesign the Languages page (https://huggingface.co/languages) of the web site.

Indeed, they could potentially have at least 615 language tags (the number of tags listed here : https://huggingface.co/datasets/bible-nlp/biblenlp-corpus) on this page. And considering that there are over 7000 languages in the world, the page could eventually be very dense.

Here are some proposals. The Hugging Face team will redesign this as they see fit from a UX/technical point of view, but here is an image of what it might look like to illustrate the idea:

1) Add a search bar to find the language we are interested in (a keyboard shortcut works just as well but this is to make it more elegant) 2) Add a column with the name of the language in the original language (to help people who don't necessarily speak English well to find their language, I don't know if it's very relevant) 3) A button to sort on the column of your choice (alphabetically for qualitative data, ascending/decreasing for quantitative data) 4) A button to unfold/fold the derivatives of a given language 5) An example of a language unfolded on French. This would include oral languages but also sign languages (in France, French sign language is recognized as a language by law). The latter would probably have better visibility than at the end of the page as at present. The sum of the "x" numbers would give the total number of models/datasets available in a given language.

I don't know if you have an opinion on the subject.

Have a nice day,

julien-c commented 2 years ago

cool, we'll take a look!

lbourdois commented 2 years ago

As mentioned in #193, documentation could be added on this page as well.

What seems minimalist to me would be to indicate which choices have been retained for the ISO codes, a link to a place where to propose a code for a language which would not be taken into account by the ISO codes and why not indicate how the numbers of models and datasets which are displayed are counted (the models take into account the public models + the private ones, whereas the datasets only take into account the public ones). It will probably be necessary to think about more about this point to know what should be included in this documentation or not.

huggingface / hub-docs

Redesign the Languages page #194