huggingface / course

The Hugging Face course on Transformers
https://huggingface.co/course
Apache License 2.0
2.17k stars 721 forks source link

Translate to Italian #45

Open lewtun opened 2 years ago

lewtun commented 2 years ago

Hi there πŸ‘‹

Let's translate the course to Italian so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

CaterinaBi commented 2 years ago

Hello Lewis, I'd love to contribute! I'm a postdoctoral researcher in theoretical linguistics at the University of Cambridge, UK. Italian is my native language. I'd love to traslate modules 1 and 2 for a start. The only thing that scares me a bit is that I'm new to GitHub, so I might end up needing some help...

lewtun commented 2 years ago

Hey @CaterinaBi thank you - we'd love to have your help with the translation! Feel free to create a post and tag me with @lewtun on our forums (https://discuss.huggingface.co/c/course/20) if you need some help on the GitHub side πŸ€—

CaterinaBi commented 2 years ago

Hey @lewtun, amazing! I'll start straight away. I'll translate the 'Transformer models' and 'Using HF Transformers' then. Do you want me to take care of the Setup instructions, too?

lewtun commented 2 years ago

Hey @lewtun, amazing! I'll start straight away. I'll translate the 'Transformer models' and 'Using HF Transformers' then. Do you want me to take care of the Setup instructions, too?

Awesome! Sorry I tagged you for the Setup section by accident 😬 . On the other hand, that might be an easy way to get familiar with GitHub and pull requests, so maybe you'd like to start there?

CaterinaBi commented 2 years ago

Yes, I'll start from there and tag you in the forum if I get lost (get ready to hear from me soon!).

sharkovsky commented 2 years ago

Hi @lewtun, I'd also like to contribute. I have a PhD in computational neuroscience from Γ‰cole Polytechnique FΓ©dΓ©rale de Lausanne. Italian is my native language as well. How about I start with module 3, and see how it goes? Is that acceptable?

@lewtun, @CaterinaBi, should we think of a way to "standardize" our translations (a shared glossary/vocabulary or something similar)? To make sure we all translate common things such as "train a neural network" in the same way.

Thank you!

lewtun commented 2 years ago

Hey @sharkovsky, thanks for helping out and good idea about a shared glossary! Feel free to create a comment here which lists the core terms. I'll also add your name to module 3 :)

ClonedOne commented 2 years ago

Hi! I would also be happy to help. I'm a phd student at Northeastern University and Italian is my native language. I can take chapter 4 if no one is working on that.

CaterinaBi commented 2 years ago

Hi @sharkovsky, having a shared glossary is a terrific idea. What about we take a few days to go through the materials, then have a quick chat and publish the standardised translations here?

lewtun commented 2 years ago

Hi @ClonedOne thank you for offering to help! I've added your name to Chapter 4 πŸš€ !

Nolanogenn commented 2 years ago

Hi! I would like to help. I am a PhD student at University of Napoli "L'Orientale", and Italian is my native language. I could work on Chapter 5 if nobody's working on it!

lewtun commented 2 years ago

Thank you @Nolanogenn for offering to help! I've added your name to the list πŸ™

sharkovsky commented 2 years ago

@lewtun, @CaterinaBi, @ClonedOne, @Nolanogenn maybe we can come up with a strategy for how to translate common words, for example "machine learning". Here are some options:

  1. always leave it in english
  2. always translate it in italian (in this case, the official translation is apprendimento automatico)
  3. always translate it in italian, but in the first instance have the english term also associated to it.

The third option looks something like "apprendimento automatic (machine learning in inglese)"

Wikipedia seems to favour option 3, and I would also vote in favour of that. I know that the italian term always sounds a bit "weird", but I feel that since we're making the effort to do a translation anyway, it's nice to try to use as many italian words as possible.

But I'm open to discussion, what is your opinion?

lewtun commented 2 years ago

Thanks for the insight and suggestions into how we can handle the machine learning jargon @sharkovsky ! I really like the analogy with Wikipedia, so would also favour option (3) too. I'm putting together a general TRANSLATING.md guide, so will add this suggestion if the other Italian speakers agree it makes sense :)

sharkovsky commented 2 years ago

Ah, another issue that comes up in italian and may appear in other languages is how you want to address the reader. In english you say: "But what if you want to ...?" In italian you should choose between:

  1. (informal singular you) "Ma cosa fare se vuoi ....?"
  2. (informal plural you) "Ma cosa fare se volete ....?"
  3. (formal singular) "Ma cosa fare se vuole ...?"
  4. (impersonal) "Ma cosa fare se si vuole ...?"

Option 4 is equivalent to the english "But what if one wants to ...?"

I would vote for option 4, except those rare cases where it sounds really clunky and weird, where I would fall back on option 2. But as before, I am open to discuss other ideas!

You'll probably have the same issue in other languages (french and spanish at least, I assume), so you want to enforce a "centralized" approach through your TRANSLATING.md I'll be happy to follow that as well.

sharkovsky commented 2 years ago

@CaterinaBi, @ClonedOne, @Nolanogenn you can find my first attempt at a translation of one file in my fork. I'm happy to receive some feedback if you think some things can be improved/better expressed... I'd rather discuss as much as possible now that we're still in a preliminary phase πŸ˜„

In a provisional manner, I also created a first glossary of terms that I think could be useful. But again, I'm happy to discuss both the translations and the format of the glossary! For example, now that I think of it, putting a file in my fork is probably not the best way to share a glossary.... @lewtun do you have any suggestions for something that we could all see and edit?

davidemastricci commented 2 years ago

Hi @lewtun, I'm A Data Scientist and a Chatbot Developer and I'd like to help with chapter 6. I'm attending πŸ€—HF course and was about to start that chapter, it would be great to translate it while learning.

Italian is my main language!

ClonedOne commented 2 years ago

@sharkovsky totally agree with both your points. I really like the glossary idea! I ended up with mostly the same translations :) except for a couple of things I'd like to suggest. Maybe we should move the discussion about the glossary on a forum post, so that it's easier to access it and suggest edits?

davidemastricci commented 2 years ago

@lewtun, @CaterinaBi, @ClonedOne, @Nolanogenn maybe we can come up with a strategy for how to translate common words, for example "machine learning". Here are some options:

  1. always leave it in english
  2. always translate it in italian (in this case, the official translation is apprendimento automatico)
  3. always translate it in italian, but in the first instance have the english term also associated to it.

The third option looks something like "apprendimento automatic (machine learning in inglese)"

Wikipedia seems to favour option 3, and I would also vote in favour of that. I know that the italian term always sounds a bit "weird", but I feel that since we're making the effort to do a translation anyway, it's nice to try to use as many italian words as possible.

But I'm open to discussion, what is your opinion?

@sharkovsky Since there is a little barrier approaching Hugging Face library, meaning that you should be familiar with terms like Machine Learning and Deep Learning, adding translation that sounds weird in Italian (ex. "apprendimento automatico" or "apprendimento profondo") could make reading less fluent.

lewtun commented 2 years ago

Thanks for this great discussion @sharkovsky - it definitely exposes some subtleties with translation projects :)

For the glossary, I suppose the simplest thing right now would be to share a Google / Notion doc that others can make suggestions to. Notion is probably easier since it supports Markdown and will make it simple to copy back to this repo.

As for how we distribute the glossary, I see two possibilities:

  1. Include it as a standalone file to help guide translators
  2. Add it as a new chapter (e.g. at the very end of the course) in an MDX file and render that on the website.

If you think a glossary would be helpful for course readers, then I would favour option 2.

CaterinaBi commented 2 years ago

Ah, another issue that comes up in italian and may appear in other languages is how you want to address the reader. In english you say: "But what if you want to ...?" In italian you should choose between:

  1. (informal singular you) "Ma cosa fare se vuoi ....?"
  2. (informal plural you) "Ma cosa fare se volete ....?"
  3. (formal singular) "Ma cosa fare se vuole ...?"
  4. (impersonal) "Ma cosa fare se si vuole ...?"

Option 4 is equivalent to the english "But what if one wants to ...?"

I would vote for option 4, except those rare cases where it sounds really clunky and weird, where I would fall back on option 2. But as before, I am open to discuss other ideas!

You'll probably have the same issue in other languages (french and spanish at least, I assume), so you want to enforce a "centralized" approach through your TRANSLATING.md I'll be happy to follow that as well.

Hi guys,

sorry for the late reply but I took a day off yesterday.

I agree with the need to standardise our translations, although I am quite torn when it comes to the question of whether or not we want to translate the technical terms. I believe that if we want a clean Italian version we should use the proposed form 'apprendimento automatico (machine learning)' but at the same time it's true that it's almost a pity to do so while literally anyone in Italy says 'machine learning' (I had to google the translation myself, I wasn't even aware that 'apprendimento automatico' was a thing). So what do we do? @davidemastricci, you had a good point there.

As for the way we address the reader that @sharkovsky mentioned ('But what if you want to ...?') I believe the best translation in Italian would be with an infinitive: 'Ma cosa/come fare per...'. None of the ones that were suggested sound natural to me.

What about the glossary, are we going to add a .mdx file here?

CaterinaBi commented 2 years ago

@CaterinaBi, @ClonedOne, @Nolanogenn you can find my first attempt at a translation of one file in my fork. I'm happy to receive some feedback if you think some things can be improved/better expressed... I'd rather discuss as much as possible now that we're still in a preliminary phase πŸ˜„

In a provisional manner, I also created a first glossary of terms that I think could be useful. But again, I'm happy to discuss both the translations and the format of the glossary! For example, now that I think of it, putting a file in my fork is probably not the best way to share a glossary.... @lewtun do you have any suggestions for something that we could all see and edit?

Hi @sharkovsky , I've checked out your fork and your first translation seems fine to me ;)

sharkovsky commented 2 years ago

Hi everyone, following @ClonedOne sensible suggestion, I converted this discussion into a forum post.

@CaterinaBi, @lewtun, @davidemastricci I tried to interpret your votes, but please feel free to correct any mistakes I made.

Everyone, please let's use the forum post to discuss from now on since it will be much clearer. I will try to monitor it closely and add any words that you suggest to the vocabulary as quickly as possible.

davidemastricci commented 2 years ago

@sharkovsky forum link do not work anymore.

sharkovsky commented 2 years ago

@sharkovsky forum link do not work anymore.

yeah it's been marked as spam by the automatic filter, it should be back soon I hope.

michimichiamo commented 2 years ago

Hello everyone, I am Michele and recently graduated in Artificial Intelligence at the University of Bologna. Italian is my native language and I would be glad to join the translation! Since Chapter 7 is still to be assigned, I propose to help with that.

lewtun commented 2 years ago

Hey @sharkovsky thanks for creating the forum post ~I've asked one of the admins to unblock it and hopefully that happens soon~ Edit: it's fixed!

@michimichiamo I've added your name to the list - welcome!

sharkovsky commented 2 years ago

Hey @sharkovsky thanks for creating the forum post - I've asked one of the admins to unblock it and hopefully that happens soon 🀞

Thank you, it just got unblocked :D

lewtun commented 2 years ago

By the way, I realised from the other translations that we need the first section from Chapter 1 to be translated in order for course to render on the website. My suggestions would be:

For the second point, we can then have a section in the _toctree.yml file with something like:

- title: Glossario
  sections:
  - local: glossary/1
    title: Glossario 

I think this way Italian readers can benefit from the great work you're doing to handle the various bits of ML jargon!

sharkovsky commented 2 years ago

Hi @lewtun,

just to clarify: we've been calling it a glossary, but it's actually just a vocabulary, i.e. a 1-1 mapping of english and italian terms. Does this meet your expectations? Or did you want us to provide short explanations for each term as well? In any case, I'm happy to oblige!

@lewtun is there a way to make my forum post editable by anyone, so everyone can contribute to the glossary directly? Otherwise I'm happy to monitor the post and add terms as people suggest them, but of course letting everyone do it for themselves would simplify things.

To everyone: the decision on whether we should translate technical terms such as "machine learning", "training set", "labels", etc... still hasn't been finalized. Ideally, everyone should express an opinion! Please let us know what you think in the forum!

Finally, I've added a new "question" in the post: should we translate the comments in the code?

lewtun commented 2 years ago

just to clarify: we've been calling it a glossary, but it's actually just a vocabulary, i.e. a 1-1 mapping of english and italian terms. Does this meet your expectations? Or did you want us to provide short explanations for each term as well? In any case, I'm happy to oblige!

Thanks for clarifying! Since other translators are taking the vocabulary route, I suggest we start with that - we can always expand it to include definitions later if we want to :)

Finally, I've added a new "question" in the post: should we translate the comments in the code?

Thanks! I replied there

sharkovsky commented 2 years ago

Sounds great, let's start with a vocabulary and see where we go from there...

If there is an english source for an "official" glossary containing term definitions and descriptions, I'm sure we can translate that to italian as well!

lewtun commented 2 years ago

If there is an english source for an "official" glossary containing term definitions and descriptions, I'm sure we can translate that to italian as well!

Sounds good. We don't have one (yet), but I'll start working on it next week and use your current list as a foundation :)

CaterinaBi commented 2 years ago

By the way, I realised from the other translations that we need the first section from Chapter 1 to be translated in order for course to render on the website. My suggestions would be:

  • @CaterinaBi would you like to open a pull request with the first section translated?
  • Iterate on the glossary in the forum, and then add it as a new "chapter" to the course.

For the second point, we can then have a section in the _toctree.yml file with something like:

- title: Glossario
  sections:
  - local: glossary/1
    title: Glossario 

I think this way Italian readers can benefit from the great work you're doing to handle the various bits of ML jargon!

Hi @lewtun, I'm sorry I didn't see your message before. I was actually wondering what I was going to do with that first section. You'll have to bear with me because I'm not sure how to do any of these. Let me figure things out, I'll keep you posted.

CaterinaBi commented 2 years ago

Hi @lewtun, me again! Just to double check: what you need is the Introduction file, i.e. chapter1/1.mdx, right?

I just have to proof-read it and then I can open a pull request.

However, there are two images in that file that need translating: 1- https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/summary.svg 2- https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/summary-dark.svg

How are we going to take care of these? I can translate them easily, just let me know how to send you the translations.

Thanks!

lewtun commented 2 years ago

Hi @lewtun, me again! Just to double check: what you need is the Introduction file, i.e. chapter1/1.mdx, right?

Yep, that's right! This will allow us to build the website and then we can add the remaining content iteratively :)

About the figures, we don't yet have a good mechanism that allows external contributors to add them to the documentation-images repository and doing it manually myself across the 20+ translations we currently have in progress will be too time consuming 🀯

I suggest we leave the images in English for now, and later add them once we have a means to enable people to contribute directly to documentation-images

EdAbati commented 2 years ago

Hi everyone, I just found out about this very nice initiative! I am a Data Scientist and Italian is my native language. I would like to contribute too if you still need some extra help.πŸ˜€ What else needs to be done? Maybe I can help with Chapter 8 (I can see that is still unassigned), or is there anything else left?

lewtun commented 2 years ago

Hi @EdAbati thank you for offering to contribute! I've added your name to chapter 8 πŸ€—

Currently we have the first 3 sections of chapter 1 translated, so maybe after chapter 8 it makes sense to discuss with the others to see if they're still actively working on the previous chapters :)

CaterinaBi commented 2 years ago

Hi @EdAbati, welcome! I'm working on chapters 1 and 2. @lewtun I've been snowed under with work but I was planning to go back to the translation between tomorrow and tonight. I'll normally be able to do both chapters I promised to translate. Have a great day, guys!

sharkovsky commented 2 years ago

Welcome @EdAbati! Before starting, please take a quick look at our discussion on the huggingface forums as well, to try to ensure that we have a consistent translation across chapters.

@lewtun same here, I apologize but I've been busy with work lately... However I'm still active, just moving very slowly! When people are done with their chapters I'll be happy to offload some of chapter 3, if I haven't finished yet.

EdAbati commented 2 years ago

Hi everyone, I hope you are all good! Not sure if you have seen it but I opened a PR https://github.com/huggingface/course/pull/272 with Chapter 8. It is still WIP I hope to finish soon. In the meantime if you want to proofread it and leave suggestions, feel free :)

CaterinaBi commented 2 years ago

Hi guys, I'm not done with my chapters yet but I'll be more than happy to proofread yours once I'm finished. Caterina

On Wed, 6 Jul 2022, 07:14 Edoardo Abati, @.***> wrote:

Hi everyone, I hope you are all good! Not sure if you have seen it but I opened a PR #272 https://github.com/huggingface/course/pull/272 with Chapter 8. It is still WIP I hope to finish soon. In the meantime if you want to proofread it and leave suggestions, feel free :)

β€” Reply to this email directly, view it on GitHub https://github.com/huggingface/course/issues/45#issuecomment-1175822474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXHXHMGF6KZJBL62OABQH5TVSUP4HANCNFSM5R6JHZYQ . You are receiving this because you were mentioned.Message ID: @.***>

sharkovsky commented 2 years ago

Hi all, I've also opened the PR #283 for chapter 3. Everything has been translated but I would like to wait maybe one week to give time to everyone to review it.

gdacciaro commented 2 years ago

Hi, I am an Italian student of AI. Can I help you with some chapters?

lewtun commented 2 years ago

Hi, I am an Italian student of AI. Can I help you with some chapters?

Welcome to the group @gdacciaro ! Yes, you are more than welcome to translate some chapters πŸš€

One option would be to see if @CaterinaBi is still working on Chapter 2 - if not, perhaps that would be a good place to start?

CaterinaBi commented 2 years ago

Hello! He can definitely have chapter 2, I'm afraid I haven't started it yet!

On Mon, 22 Aug 2022, 09:23 lewtun, @.***> wrote:

Hi, I am an Italian student of AI. Can I help you with some chapters?

Welcome to the group @gdacciaro https://github.com/gdacciaro ! Yes, you are more than welcome to translate some chapters πŸš€

One option would be to see if @CaterinaBi https://github.com/CaterinaBi is still working on Chapter 2 - if not, perhaps that would be a good place to start?

β€” Reply to this email directly, view it on GitHub https://github.com/huggingface/course/issues/45#issuecomment-1222017465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXHXHMH4U2HMGYFCHBTU3DDV2M2HHANCNFSM5R6JHZYQ . You are receiving this because you were mentioned.Message ID: @.***>

gdacciaro commented 2 years ago

Fine, I will translate chapter 2 :)

EdAbati commented 2 years ago

Hi @lewtun, I think I can still help a bit. Are there any other sections that need to be translated?

lewtun commented 2 years ago

Hi @lewtun, I think I can still help a bit. Are there any other sections that need to be translated?

Nice, thanks for offering to help @EdAbati ! If @gdacciaro or @sharkovsky aren't working on Chapters 2 or 3, I think those would be great to have translated :)

gdacciaro commented 2 years ago

I'm actually working on it