🌐 [i18n-KO] Translating docs to Korean

wonhyeongseo commented 2 years ago

Hi!

Let's bring the documentation to all the Korean-speaking community 🌏 (currently 9 out of 77 complete)

Would you want to translate? Please follow the 🤗 TRANSLATING guide. Here is a list of the files ready for translation. Let us know in this issue if you'd like to translate any, and we'll add your name to the list.

Some notes:

Please translate using an informal tone (imagine you are talking with a friend about transformers 🤗).
Please translate in a gender-neutral way.
Add your translations to the folder called ko inside the source folder.
Register your translation in ko/_toctree.yml; please follow the order of the English version.
Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue. Please ping @ArthurZucker, @sgugger and @eunseojo for review.
🙋 If you'd like others to help you with the translation, you can also post in the 🤗 forums.
With the HuggingFace Documentation l10n initiative of Pseudo Lab, full translation will be done even faster. 🎉 Please give us your support! Cheers to our team 👍@0525hhgus, @KIHOON71, @gabrielwithappy, @jungnerd, @sim-so, @HanNayeoniee, @wonhyeongseo

안녕하세요!

한국어를 사용하는 모두가 기술 문서를 읽을 수 있게 해보아요 🌏 (현재 77개 문서 중 9개 완료)

번역에 참여하고 싶으신가요? 🤗 번역 가이드를 먼저 읽어보시기 바랍니다. 끝 부분에 번역해야할 파일들이 나열되어 있습니다. 작업하고 계신 파일이 있다면 여기에 간단히 알려주세요. 중복되지 않도록 작업중으로 표시해둘게요.

참고 사항:

기술 문서이지만 (친구에게 설명 듣듯이) 쉽게 읽히면 좋겠습니다. 존댓말 로 써주시면 감사하겠습니다.
성별은 일부 언어(스페인어, 프랑스어 등)에만 적용되는 사항으로, 한국어의 경우 번역기를 사용하신 후 문장 기호와 조사 등이 알맞는지 확인해주시기 바랍니다.
소스 폴더 아래 ko 폴더에 번역본을 넣어주세요.
목차(ko/_toctree.yml)도 함께 업데이트해주세요. 영어 목차와 순서가 동일해야 합니다.
모두 마치셨다면, 기록이 원활하도록 PR을 여실 때 현재 이슈(#20179)를 내용에 넣어주시기 바랍니다. 리뷰 요청은 @ArthurZucker님, @sgugger님, @eunseojo님께 요청해주세요.
🙋 커뮤니티에 마음껏 홍보해주시기 바랍니다! 🤗 포럼에 올리셔도 좋아요.
가짜연구소의 이니셔티브로 번역이 더욱 빠르게 진행될 예정입니다. 🎉 많은 응원 부탁드려요! 우리팀 화이팅 👍 @0525hhgus, @KIHOON71, @gabrielwithappy, @jungnerd, @sim-so, @HanNayeoniee, @wonhyeongseo

GET STARTED

[x] 🤗 Transformers https://github.com/huggingface/transformers/pull/20180
[x] Quick tour https://github.com/huggingface/transformers/pull/20946
[x] Installation https://github.com/huggingface/transformers/pull/20948

TUTORIAL

[x] Pipelines for inference https://github.com/huggingface/transformers/pull/22508
[x] Load pretrained instances with an AutoClass https://github.com/huggingface/transformers/pull/22533
[x] Preprocess https://github.com/huggingface/transformers/pull/22578
[x] Fine-tune a pretrained model https://github.com/huggingface/transformers/pull/22670
[x] Train with a script https://github.com/huggingface/transformers/pull/22793
[x] Distributed training with 🤗 Accelerate https://github.com/huggingface/transformers/pull/22830
[x] Load and train adapters with 🤗 PEFT https://github.com/huggingface/transformers/pull/25706
[x] Share a model
[x] Agents https://github.com/huggingface/transformers/pull/24881
[x] Generation with LLMs https://github.com/huggingface/transformers/pull/25791

TASK GUIDES

NATURAL LANGUAGE PROCESSING

[x] Text classification https://github.com/huggingface/transformers/pull/22655
[x] Token classification https://github.com/huggingface/transformers/pull/22945
[x] Question answering
[x] Causal language modeling
[x] Masked language modeling https://github.com/huggingface/transformers/pull/22838
[x] Translation https://github.com/huggingface/transformers/pull/22805
[x] Summarization https://github.com/huggingface/transformers/pull/22783
[x] Multiple choice

AUDIO

[x] Audio classification https://github.com/huggingface/transformers/pull/26200
[x] Automatic speech recognition

COMPUTER VISION

[x] Image classification
[x] Semantic segmentation https://github.com/huggingface/transformers/pull/26515
[x] Video classification
[x] Object detection
[x] Zero-shot object detection
[x] Zero-shot image classification
[x] Depth estimation

MULTIMODAL

[x] Image captioning
[x] Document Question Answering
[x] Visual Question Answering https://github.com/huggingface/transformers/pull/25679
[ ] Text to speech

GENERATION

[ ] Customize the generation strategy

DEVELOPER GUIDES

[x] Use tokenizers from 🤗 Tokenizers https://github.com/huggingface/transformers/pull/22956
[x] Inference for multilingual models
[x] Create a custom architecture https://github.com/huggingface/transformers/pull/22754
[x] Sharing custom models https://github.com/huggingface/transformers/pull/22534
[x] Run training on Amazon SageMaker https://github.com/huggingface/transformers/pull/22509
[x] Export to ONNX https://github.com/huggingface/transformers/pull/22806
[x] Export to TFLite
[x] Export to TorchScript
[ ] Benchmarks
[ ] Notebooks with examples
[x] Community resources https://github.com/huggingface/transformers/pull/25674
[x] Custom Tools and Prompts
[x] Troubleshoot

PERFORMANCE AND SCALABILITY

[x] Overview

EFFICIENT TRAINING TECHNIQUES

[ ] Training on one GPU https://github.com/huggingface/transformers/pull/25250
[x] Training on many GPUs https://github.com/huggingface/transformers/pull/26244
[x] Training on CPU https://github.com/huggingface/transformers/pull/24911
[x] Training on many CPUs https://github.com/huggingface/transformers/pull/24923
[ ] Training on TPUs
[x] Training on TPU with TensorFlow
[ ] Training on Specialized Hardware
[x] Custom hardware for training https://github.com/huggingface/transformers/pull/24966
[x] Hyperparameter Search using Trainer API

## Other relevant PRs along the way

- Enable easy Table of Contents editing https://github.com/huggingface/transformers/pull/22581 - Added forgotten internal English anchors for `sagemaker.mdx` https://github.com/huggingface/transformers/pull/22549 - Fixed anchor links for `auto_class`, `training` https://github.com/huggingface/transformers/pull/22796 - Update ToC from upstream https://github.com/huggingface/transformers/pull/23112

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

wonhyeongseo commented 1 year ago

Hello @sgugger, may you please add the WIP tag to this issue? Thank you so much.

wonhyeongseo commented 1 year ago

For contributors and PseudoLab team members, please see a PR template gist (raw) that could ease your first PR experience. @0525hhgus, @KIHOON71, @gabrielwithappy, @jungnerd, @sim-so, @HanNayeoniee, @wonhyeongseo

gabrielwithappy commented 1 year ago

Dear @sgugger, would you add document label to this issue? I think other issues for the translation have a document label. Thank you in advance

@wonhyeongseo I changed my PR with a new PR template. would you change Load pretrained instances with an AutoClass to [WIP]🌐[i18n-KO] Translate autoclass_tutorial to Korean and Fix the typo of quicktour #22533

gabrielwithappy commented 1 year ago

@sgugger wow! Thank you a million! :-)

wonhyeongseo commented 1 year ago

@sgugger Dear HuggingFace Team,

I hope you are doing well. My name is Wonhyeong Seo from the Pseudo Lab team. As you may know, we are actively working on localizing the huggingface/transformers repository documentation into Korean. Our goal is to make this valuable resource more accessible to Korean-speaking users, thereby promoting the development of NLP and machine learning in Korea and beyond.

We are currently in the process of applying for government sponsorship to support our localization efforts. To strengthen our application, we kindly request your permission to use the documentation's Google Analytics data to include in our reports. This data will help us demonstrate the impact of our work and the potential benefits of localizing the documentation.

Additionally, we would be grateful for any feedback or suggestions from the HuggingFace team regarding our localization project. Your insights will be invaluable in ensuring our efforts align with your vision and standards, and in fostering a successful collaboration.

Thank you for considering our request. We look forward to your response and the opportunity to work together to expand the reach of the huggingface/transformers repository.

Best regards, Hyunseo Yun, Kihoon Son, Gabriel Yang, Sohyun Sim, Nayeon Han, Woojun Jung, Wonhyeong Seo The Localization Initiative members of Pseudo Lab

LysandreJik commented 1 year ago

Hey @wonhyeongseo, thanks for all you work on translating the documentation to Korean!

Do you mind contacting me at lysandre at hf.co so we may see how best to help you?

wonhyeongseo commented 1 year ago

Welcome to a simple guide on how to use ChatGPT to speed up the translation process. By following these guidelines, you can create a first draft in less than an hour. Please note that it is essential to proofread your work thoroughly before sharing it with your colleagues.

(Optional) If you want to extract only the content without code blocks, tables, and redundant new lines, you can use the command sed '/```/,/```/d' file.md | sed '/^|.*|$/d' | sed '/^$/N;/^\n$/D'. In case you are using a mobile device, you can check the link https://sed.js.org/ for using sed online.

To initiate the translation process, you need to provide your sentences as input to ChatGPT. Your first prompt should look like this:

What do these sentences about Hugging Face Transformers (a machine learning library) mean in Korean? Please do not translate the word after a 🤗 emoji as it is a product name.
```md
<your sentences>

After submitting the first prompt, you can use the following prefix for the next ten prompts:

```next-part
<your sentences>

Note that after ten prompts, you must remind ChatGPT of the task if you are not using LangChain.

By following these guidelines, you can create a first draft of your translation in a shorter time frame. However, it is crucial to emphasize that the quality of the final output depends on the accuracy of the input and the proofreading process.

PS: Please note that we do not have a Korean LLM that can automate the proofreading process at the moment. However, in July, Naver plans to launch their HyperCLOVA Korean LLM model, which might automate the entire process. We are optimistic that our government proposal will be accepted, allowing us to increase our talent pool and work towards achieving a more automated translation process with them.

wonhyeongseo commented 1 year ago

Dear @LysandreJik ,

I hope you are doing well. I wanted to inform you that I have sent an email with the subject line "[i18n-KO] Request for Collaboration: Hugging Face Mentorship Program." Whenever you have a moment, please take a look and provide a response. Thank you so much for your interest to this collaboration. If you have any questions, please don't hesitate to contact me.

Best regards, Wonhyeong Seo

wonhyeongseo commented 1 year ago

@gabrielwithappy @sim-so @jungnerd @HanNayeoniee @0525hhgus @KIHOON71 From this merge of model_sharing.mdx #22991 , I learned that we don't have to git rebase -i as other open source libraries mandate. Therefore, I propose we commit in 4 steps like this:

docs: ko: <file-name> - As we always do for the first commit. Copy the initial English file under ko and edit TOC: both external and (soon-to-be-automated) internal.

From this point forward, you may need to squash commits in each step.

feat: [nmt|manual] draft - Machine-translate the entire file with: dedicated translators, prompts, or any kind of automation. You may choose to translate manually, and that is ok as long as you specify it in the commit message.
fix: manual edits - Proofread the draft thoroughly.
fix: resolve suggestions - Get reviews and resolve suggestions.

With this, it will be easier for collaborators to see the original English and your changes side by side. Not to mention, we can use diffs as pre-training data for the in-house rlhf translation model.

@ArthurZucker @sgugger , when merging a PR, how is the main commit message decided if there are multiple commits? Do you have to manually write it, or is the first commit message of the PR selected? Thank you for your insights and continued support. Much love from Korea 🇰🇷💖💕🙏

sgugger commented 1 year ago

The main commit message is the title of the PR.

osanseviero commented 1 year ago

Hey all! As some people were interested in a place to discuss about translations, we opened a category in the HF Discord server with a category for internationalization and translation efforts, including a Korean channel!

stevhliu commented 1 year ago

Hi Pseudo Lab friends! I just wanted to provide a quick update on where the translation progress currently stands:

73% done ✅
6 PRs pending review; once merged, you'll be up to 81% 📈
15 files left to translate before ✨ 100% ✨

Great work, and big thanks again for all your contributions to fully translate the 🤗 Transformers documentation.

zayunsna commented 1 year ago

안녕하세요 개인적으로 text generation part의 번역에 참여하고자 합니다. draft가 완성되면 PR보내드리겠습니다!

Hi All! I would like to participate the translation job (especailly the part of text generation). If a first draft is done, I will send a PR request and then let you know.

heuristicwave commented 12 months ago

huggingface_hub의 docs를 transformer로 잘못 멘션했습니다. 현재 수정해 두었으며, 바로 위 멘션은 무시해주세요. 죄송합니다.

I incorrectly mentioned huggingface_hub's docs as a transformer, I've fixed it now, please ignore the comment immediately above, sorry.

jungnerd commented 1 month ago

Hi @stevhliu 👋🏻

Our team is currently conducting a translation sprint, resulting in a lot of PRs.

By the way, we recently discovered that the Quantization Methods (which is 경량화 메소드 in Korean) section in ko/_toctree.yml was accidentally duplicated in a previous PR, which unfortunately went unnoticed before being merged. As a result, this duplication now exists across all subsequent PRs, including those that are still open.

Given the large number of merged and open PRs, we are considering two possible approaches to address this:

When all current PRs are merged, then submitting a dedicated PR to fix the duplication in ko/_toctree.yml.
Explore alternative solutions that can resolve the issue more efficiently.

We apologize for not catching this during the review process 🥲 We're seeking advice on the best approach to resolve this issue while minimizing disruption to ongoing work.

Any suggestions or recommendations would be greatly appreciated, and we sincerely appreciate your ongoing support and careful reviews of our PRs ❤️

stevhliu commented 1 month ago

Great work on the translation sprint everyone! 🚀

I think it'll be easiest to submit a dedicated PR to remove the duplicated section once all the current PRs are merged :)

maximizemaxwell commented 2 weeks ago

Hi there, I want to contribute to the translation of the "Training on Specialized Hardware" section and submit a PR once it’s completed. Is it OK?

stevhliu commented 2 weeks ago

Hi and thanks for your interest in translating @maximizemaxwell, feel free to translate that section! 🤗

huggingface / transformers