huggingface / transformers

๐Ÿค— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
129.68k stars 25.76k forks source link

๐ŸŒ [i18n-KO] Translating docs to Korean #20179

Open wonhyeongseo opened 1 year ago

wonhyeongseo commented 1 year ago

Hi!

Let's bring the documentation to all the Korean-speaking community ๐ŸŒ (currently 9 out of 77 complete)

Would you want to translate? Please follow the ๐Ÿค— TRANSLATING guide. Here is a list of the files ready for translation. Let us know in this issue if you'd like to translate any, and we'll add your name to the list.

Some notes:

์•ˆ๋…•ํ•˜์„ธ์š”!

ํ•œ๊ตญ์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋‘๊ฐ€ ๊ธฐ์ˆ  ๋ฌธ์„œ๋ฅผ ์ฝ์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด๋ณด์•„์š” ๐ŸŒ (ํ˜„์žฌ 77๊ฐœ ๋ฌธ์„œ ์ค‘ 9๊ฐœ ์™„๋ฃŒ)

๋ฒˆ์—ญ์— ์ฐธ์—ฌํ•˜๊ณ  ์‹ถ์œผ์‹ ๊ฐ€์š”? ๐Ÿค— ๋ฒˆ์—ญ ๊ฐ€์ด๋“œ๋ฅผ ๋จผ์ € ์ฝ์–ด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. ๋ ๋ถ€๋ถ„์— ๋ฒˆ์—ญํ•ด์•ผํ•  ํŒŒ์ผ๋“ค์ด ๋‚˜์—ด๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ž‘์—…ํ•˜๊ณ  ๊ณ„์‹  ํŒŒ์ผ์ด ์žˆ๋‹ค๋ฉด ์—ฌ๊ธฐ์— ๊ฐ„๋‹จํžˆ ์•Œ๋ ค์ฃผ์„ธ์š”. ์ค‘๋ณต๋˜์ง€ ์•Š๋„๋ก ์ž‘์—…์ค‘์œผ๋กœ ํ‘œ์‹œํ•ด๋‘˜๊ฒŒ์š”.

์ฐธ๊ณ  ์‚ฌํ•ญ:

GET STARTED

TUTORIAL

TASK GUIDES

NATURAL LANGUAGE PROCESSING

AUDIO

COMPUTER VISION

MULTIMODAL

GENERATION

DEVELOPER GUIDES

PERFORMANCE AND SCALABILITY

EFFICIENT TRAINING TECHNIQUES

OPTIMIZING INFERENCE

CONTRIBUTE

CONCEPTUAL GUIDES

## Other relevant PRs along the way - Enable easy Table of Contents editing https://github.com/huggingface/transformers/pull/22581 - Added forgotten internal English anchors for `sagemaker.mdx` https://github.com/huggingface/transformers/pull/22549 - Fixed anchor links for `auto_class`, `training` https://github.com/huggingface/transformers/pull/22796 - Update ToC from upstream https://github.com/huggingface/transformers/pull/23112
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

wonhyeongseo commented 1 year ago

Hello @sgugger, may you please add the WIP tag to this issue? Thank you so much.

wonhyeongseo commented 1 year ago

For contributors and PseudoLab team members, please see a PR template gist (raw) that could ease your first PR experience. @0525hhgus, @KIHOON71, @gabrielwithappy, @jungnerd, @sim-so, @HanNayeoniee, @wonhyeongseo

gabrielwithappy commented 1 year ago

Dear @sgugger, would you add document label to this issue? I think other issues for the translation have a document label. Thank you in advance

@wonhyeongseo I changed my PR with a new PR template. would you change Load pretrained instances with an AutoClass to [WIP]๐ŸŒ[i18n-KO] Translate autoclass_tutorial to Korean and Fix the typo of quicktour #22533

gabrielwithappy commented 1 year ago

@sgugger wow! Thank you a million! :-)

wonhyeongseo commented 1 year ago

@sgugger Dear HuggingFace Team,

I hope you are doing well. My name is Wonhyeong Seo from the Pseudo Lab team. As you may know, we are actively working on localizing the huggingface/transformers repository documentation into Korean. Our goal is to make this valuable resource more accessible to Korean-speaking users, thereby promoting the development of NLP and machine learning in Korea and beyond.

We are currently in the process of applying for government sponsorship to support our localization efforts. To strengthen our application, we kindly request your permission to use the documentation's Google Analytics data to include in our reports. This data will help us demonstrate the impact of our work and the potential benefits of localizing the documentation.

Additionally, we would be grateful for any feedback or suggestions from the HuggingFace team regarding our localization project. Your insights will be invaluable in ensuring our efforts align with your vision and standards, and in fostering a successful collaboration.

Thank you for considering our request. We look forward to your response and the opportunity to work together to expand the reach of the huggingface/transformers repository.

Best regards, Hyunseo Yun, Kihoon Son, Gabriel Yang, Sohyun Sim, Nayeon Han, Woojun Jung, Wonhyeong Seo The Localization Initiative members of Pseudo Lab

LysandreJik commented 1 year ago

Hey @wonhyeongseo, thanks for all you work on translating the documentation to Korean!

Do you mind contacting me at lysandre at hf.co so we may see how best to help you?

wonhyeongseo commented 1 year ago

Welcome to a simple guide on how to use ChatGPT to speed up the translation process. By following these guidelines, you can create a first draft in less than an hour. Please note that it is essential to proofread your work thoroughly before sharing it with your colleagues.

(Optional) If you want to extract only the content without code blocks, tables, and redundant new lines, you can use the command sed '/```/,/```/d' file.md | sed '/^|.*|$/d' | sed '/^$/N;/^\n$/D'. In case you are using a mobile device, you can check the link https://sed.js.org/ for using sed online.

To initiate the translation process, you need to provide your sentences as input to ChatGPT. Your first prompt should look like this:

What do these sentences about Hugging Face Transformers (a machine learning library) mean in Korean? Please do not translate the word after a ๐Ÿค— emoji as it is a product name.
```md
<your sentences>

After submitting the first prompt, you can use the following prefix for the next ten prompts:

```next-part
<your sentences>

Note that after ten prompts, you must remind ChatGPT of the task if you are not using LangChain.

By following these guidelines, you can create a first draft of your translation in a shorter time frame. However, it is crucial to emphasize that the quality of the final output depends on the accuracy of the input and the proofreading process.

PS: Please note that we do not have a Korean LLM that can automate the proofreading process at the moment. However, in July, Naver plans to launch their HyperCLOVA Korean LLM model, which might automate the entire process. We are optimistic that our government proposal will be accepted, allowing us to increase our talent pool and work towards achieving a more automated translation process with them.

wonhyeongseo commented 1 year ago

Dear @LysandreJik ,

I hope you are doing well. I wanted to inform you that I have sent an email with the subject line "[i18n-KO] Request for Collaboration: Hugging Face Mentorship Program." Whenever you have a moment, please take a look and provide a response. Thank you so much for your interest to this collaboration. If you have any questions, please don't hesitate to contact me.

Best regards, Wonhyeong Seo

wonhyeongseo commented 1 year ago

@gabrielwithappy @sim-so @jungnerd @HanNayeoniee @0525hhgus @KIHOON71 From this merge of model_sharing.mdx #22991 , I learned that we don't have to git rebase -i as other open source libraries mandate. Therefore, I propose we commit in 4 steps like this:

  1. docs: ko: <file-name> - As we always do for the first commit. Copy the initial English file under ko and edit TOC: both external and (soon-to-be-automated) internal.

From this point forward, you may need to squash commits in each step.

  1. feat: [nmt|manual] draft - Machine-translate the entire file with: dedicated translators, prompts, or any kind of automation. You may choose to translate manually, and that is ok as long as you specify it in the commit message.
  2. fix: manual edits - Proofread the draft thoroughly.
  3. fix: resolve suggestions - Get reviews and resolve suggestions.

With this, it will be easier for collaborators to see the original English and your changes side by side. Not to mention, we can use diffs as pre-training data for the in-house rlhf translation model.

@ArthurZucker @sgugger , when merging a PR, how is the main commit message decided if there are multiple commits? Do you have to manually write it, or is the first commit message of the PR selected? Thank you for your insights and continued support. Much love from Korea ๐Ÿ‡ฐ๐Ÿ‡ท๐Ÿ’–๐Ÿ’•๐Ÿ™

sgugger commented 1 year ago

The main commit message is the title of the PR.

osanseviero commented 1 year ago

Hey all! As some people were interested in a place to discuss about translations, we opened a category in the HF Discord server with a category for internationalization and translation efforts, including a Korean channel!

stevhliu commented 11 months ago

Hi Pseudo Lab friends! I just wanted to provide a quick update on where the translation progress currently stands:

Great work, and big thanks again for all your contributions to fully translate the ๐Ÿค— Transformers documentation.

zayunsna commented 9 months ago

์•ˆ๋…•ํ•˜์„ธ์š” ๊ฐœ์ธ์ ์œผ๋กœ text generation part์˜ ๋ฒˆ์—ญ์— ์ฐธ์—ฌํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. draft๊ฐ€ ์™„์„ฑ๋˜๋ฉด PR๋ณด๋‚ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค!

Hi All! I would like to participate the translation job (especailly the part of text generation). If a first draft is done, I will send a PR request and then let you know.

heuristicwave commented 8 months ago

huggingface_hub์˜ docs๋ฅผ transformer๋กœ ์ž˜๋ชป ๋ฉ˜์…˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ˜„์žฌ ์ˆ˜์ •ํ•ด ๋‘์—ˆ์œผ๋ฉฐ, ๋ฐ”๋กœ ์œ„ ๋ฉ˜์…˜์€ ๋ฌด์‹œํ•ด์ฃผ์„ธ์š”. ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค.

I incorrectly mentioned huggingface_hub's docs as a transformer, I've fixed it now, please ignore the comment immediately above, sorry.