Open wonhyeongseo opened 2 years ago
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hello @sgugger, may you please add the WIP
tag to this issue? Thank you so much.
Dear @sgugger, would you add document
label to this issue?
I think other issues for the translation have a document
label.
Thank you in advance
@wonhyeongseo
I changed my PR with a new PR template. would you change
Load pretrained instances with an AutoClass
to [WIP]๐[i18n-KO] Translate autoclass_tutorial to Korean and Fix the typo of quicktour #22533
@sgugger wow! Thank you a million! :-)
@sgugger Dear HuggingFace Team,
I hope you are doing well. My name is Wonhyeong Seo from the Pseudo Lab team. As you may know, we are actively working on localizing the huggingface/transformers
repository documentation into Korean. Our goal is to make this valuable resource more accessible to Korean-speaking users, thereby promoting the development of NLP and machine learning in Korea and beyond.
We are currently in the process of applying for government sponsorship to support our localization efforts. To strengthen our application, we kindly request your permission to use the documentation's Google Analytics data to include in our reports. This data will help us demonstrate the impact of our work and the potential benefits of localizing the documentation.
Additionally, we would be grateful for any feedback or suggestions from the HuggingFace team regarding our localization project. Your insights will be invaluable in ensuring our efforts align with your vision and standards, and in fostering a successful collaboration.
Thank you for considering our request. We look forward to your response and the opportunity to work together to expand the reach of the huggingface/transformers
repository.
Best regards, Hyunseo Yun, Kihoon Son, Gabriel Yang, Sohyun Sim, Nayeon Han, Woojun Jung, Wonhyeong Seo The Localization Initiative members of Pseudo Lab
Hey @wonhyeongseo, thanks for all you work on translating the documentation to Korean!
Do you mind contacting me at lysandre at hf.co so we may see how best to help you?
Welcome to a simple guide on how to use ChatGPT to speed up the translation process. By following these guidelines, you can create a first draft in less than an hour. Please note that it is essential to proofread your work thoroughly before sharing it with your colleagues.
(Optional) If you want to extract only the content without code blocks, tables, and redundant new lines, you can use the command sed '/```/,/```/d' file.md | sed '/^|.*|$/d' | sed '/^$/N;/^\n$/D'
. In case you are using a mobile device, you can check the link https://sed.js.org/ for using sed online.
To initiate the translation process, you need to provide your sentences as input to ChatGPT. Your first prompt should look like this:
What do these sentences about Hugging Face Transformers (a machine learning library) mean in Korean? Please do not translate the word after a ๐ค emoji as it is a product name.
```md
<your sentences>
After submitting the first prompt, you can use the following prefix for the next ten prompts:
```next-part
<your sentences>
Note that after ten prompts, you must remind ChatGPT of the task if you are not using LangChain.
By following these guidelines, you can create a first draft of your translation in a shorter time frame. However, it is crucial to emphasize that the quality of the final output depends on the accuracy of the input and the proofreading process.
PS: Please note that we do not have a Korean LLM that can automate the proofreading process at the moment. However, in July, Naver plans to launch their HyperCLOVA Korean LLM model, which might automate the entire process. We are optimistic that our government proposal will be accepted, allowing us to increase our talent pool and work towards achieving a more automated translation process with them.
Dear @LysandreJik ,
I hope you are doing well. I wanted to inform you that I have sent an email with the subject line "[i18n-KO] Request for Collaboration: Hugging Face Mentorship Program." Whenever you have a moment, please take a look and provide a response. Thank you so much for your interest to this collaboration. If you have any questions, please don't hesitate to contact me.
Best regards, Wonhyeong Seo
@gabrielwithappy @sim-so @jungnerd @HanNayeoniee @0525hhgus @KIHOON71
From this merge of model_sharing.mdx
#22991 , I learned that we don't have to git rebase -i
as other open source libraries mandate. Therefore, I propose we commit in 4 steps like this:
docs: ko: <file-name>
- As we always do for the first commit. Copy the initial English file under ko
and edit TOC: both external and (soon-to-be-automated) internal.From this point forward, you may need to squash commits in each step.
feat: [nmt|manual] draft
- Machine-translate the entire file with: dedicated translators, prompts, or any kind of automation. You may choose to translate manually, and that is ok as long as you specify it in the commit message.fix: manual edits
- Proofread the draft thoroughly.fix: resolve suggestions
- Get reviews and resolve suggestions.With this, it will be easier for collaborators to see the original English and your changes side by side. Not to mention, we can use diffs as pre-training data for the in-house rlhf translation model.
@ArthurZucker @sgugger , when merging a PR, how is the main commit message decided if there are multiple commits? Do you have to manually write it, or is the first commit message of the PR selected? Thank you for your insights and continued support. Much love from Korea ๐ฐ๐ท๐๐๐
The main commit message is the title of the PR.
Hey all! As some people were interested in a place to discuss about translations, we opened a category in the HF Discord server with a category for internationalization and translation efforts, including a Korean channel!
Hi Pseudo Lab friends! I just wanted to provide a quick update on where the translation progress currently stands:
Great work, and big thanks again for all your contributions to fully translate the ๐ค Transformers documentation.
์๋
ํ์ธ์ ๊ฐ์ธ์ ์ผ๋ก text generation
part์ ๋ฒ์ญ์ ์ฐธ์ฌํ๊ณ ์ ํฉ๋๋ค.
draft๊ฐ ์์ฑ๋๋ฉด PR๋ณด๋ด๋๋ฆฌ๊ฒ ์ต๋๋ค!
Hi All! I would like to participate the translation job (especailly the part of text generation
).
If a first draft is done, I will send a PR request and then let you know.
huggingface_hub์ docs๋ฅผ transformer๋ก ์๋ชป ๋ฉ์ ํ์ต๋๋ค. ํ์ฌ ์์ ํด ๋์์ผ๋ฉฐ, ๋ฐ๋ก ์ ๋ฉ์ ์ ๋ฌด์ํด์ฃผ์ธ์. ์ฃ์กํฉ๋๋ค.
I incorrectly mentioned huggingface_hub's docs as a transformer, I've fixed it now, please ignore the comment immediately above, sorry.
Hi @stevhliu ๐๐ป
Our team is currently conducting a translation sprint, resulting in a lot of PRs.
By the way, we recently discovered that the Quantization Methods
(which is ๊ฒฝ๋ํ ๋ฉ์๋
in Korean) section in ko/_toctree.yml
was accidentally duplicated in a previous PR, which unfortunately went unnoticed before being merged. As a result, this duplication now exists across all subsequent PRs, including those that are still open.
Given the large number of merged and open PRs, we are considering two possible approaches to address this:
ko/_toctree.yml
.We apologize for not catching this during the review process ๐ฅฒ We're seeking advice on the best approach to resolve this issue while minimizing disruption to ongoing work.
Any suggestions or recommendations would be greatly appreciated, and we sincerely appreciate your ongoing support and careful reviews of our PRs โค๏ธ
Great work on the translation sprint everyone! ๐
I think it'll be easiest to submit a dedicated PR to remove the duplicated section once all the current PRs are merged :)
Hi there, I want to contribute to the translation of the "Training on Specialized Hardware" section and submit a PR once itโs completed. Is it OK?
Hi and thanks for your interest in translating @maximizemaxwell, feel free to translate that section! ๐ค
Hi!
Let's bring the documentation to all the Korean-speaking community ๐ (currently 9 out of 77 complete)
Would you want to translate? Please follow the ๐ค TRANSLATING guide. Here is a list of the files ready for translation. Let us know in this issue if you'd like to translate any, and we'll add your name to the list.
Some notes:
ko
inside the source folder.ko/_toctree.yml
; please follow the order of the English version.์๋ ํ์ธ์!
ํ๊ตญ์ด๋ฅผ ์ฌ์ฉํ๋ ๋ชจ๋๊ฐ ๊ธฐ์ ๋ฌธ์๋ฅผ ์ฝ์ ์ ์๊ฒ ํด๋ณด์์ ๐ (ํ์ฌ 77๊ฐ ๋ฌธ์ ์ค 9๊ฐ ์๋ฃ)
๋ฒ์ญ์ ์ฐธ์ฌํ๊ณ ์ถ์ผ์ ๊ฐ์? ๐ค ๋ฒ์ญ ๊ฐ์ด๋๋ฅผ ๋จผ์ ์ฝ์ด๋ณด์๊ธฐ ๋ฐ๋๋๋ค. ๋ ๋ถ๋ถ์ ๋ฒ์ญํด์ผํ ํ์ผ๋ค์ด ๋์ด๋์ด ์์ต๋๋ค. ์์ ํ๊ณ ๊ณ์ ํ์ผ์ด ์๋ค๋ฉด ์ฌ๊ธฐ์ ๊ฐ๋จํ ์๋ ค์ฃผ์ธ์. ์ค๋ณต๋์ง ์๋๋ก
์์ ์ค
์ผ๋ก ํ์ํด๋๊ฒ์.์ฐธ๊ณ ์ฌํญ:
ko
ํด๋์ ๋ฒ์ญ๋ณธ์ ๋ฃ์ด์ฃผ์ธ์.ko/_toctree.yml
)๋ ํจ๊ป ์ ๋ฐ์ดํธํด์ฃผ์ธ์. ์์ด ๋ชฉ์ฐจ์ ์์๊ฐ ๋์ผํด์ผ ํฉ๋๋ค.#20179
)๋ฅผ ๋ด์ฉ์ ๋ฃ์ด์ฃผ์๊ธฐ ๋ฐ๋๋๋ค. ๋ฆฌ๋ทฐ ์์ฒญ์ @ArthurZucker๋, @sgugger๋, @eunseojo๋๊ป ์์ฒญํด์ฃผ์ธ์.GET STARTED
TUTORIAL
TASK GUIDES
NATURAL LANGUAGE PROCESSING
AUDIO
COMPUTER VISION
MULTIMODAL
GENERATION
DEVELOPER GUIDES
PERFORMANCE AND SCALABILITY
EFFICIENT TRAINING TECHNIQUES
OPTIMIZING INFERENCE
torch.compile
CONTRIBUTE
CONCEPTUAL GUIDES
## Other relevant PRs along the way
- Enable easy Table of Contents editing https://github.com/huggingface/transformers/pull/22581 - Added forgotten internal English anchors for `sagemaker.mdx` https://github.com/huggingface/transformers/pull/22549 - Fixed anchor links for `auto_class`, `training` https://github.com/huggingface/transformers/pull/22796 - Update ToC from upstream https://github.com/huggingface/transformers/pull/23112