krahets / hello-algo

《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing
https://hello-algo.com
Other
93.94k stars 11.9k forks source link

Chinese-to-English (Help Wanted) #914

Open krahets opened 9 months ago

krahets commented 9 months ago

Contributing guidelines for Chinese-to-English

We are working on translating "Hello Algo" from Chinese to English with the following approach:

  1. AI translation: Carry out an initial pass of translations using the machine learning translator.
  2. Human optimization: Manually refine the machine-generated outputs to ensure authenticity and accuracy.
  3. Pull request review: The optimized translation will be doubly checked by the reviewers through GitHub pull request workflow.
  4. Repeat steps 2. and 3. for further improvements.
image

Join us

We're seeking contributors who meet the following criteria.

That is, our contributors are computer scientists, engineers, and students from different linguistic backgrounds, and their objectives have different focal points:

Don't hesitate to join us via WeChat krahets-jyd or on Discord!

Contributing guideline

Please visit en/CONTRIBUTING.md for more details.

krahets commented 9 months ago

Check out the following PR for more clarity on the workflow:

dxtym commented 6 months ago

Hi! I don't know Chinese, but can I contribute to this? I have formal background in CS. Thank you!

krahets commented 6 months ago

Hi! I don't know Chinese, but can I contribute to this? I have formal background in CS. Thank you!

Welcome! I think you can engage in the PR reviewing, focusing on optimizing fluency and authenticity, if you’re proficient in English. Is English your first language?

dxtym commented 6 months ago

Hi! I don't know Chinese, but can I contribute to this? I have formal background in CS. Thank you!

Welcome! I think you can engage in the PR reviewing, focusing on optimizing fluency and autheticy, if you’re proficient in English. Is English your first language?

Not really, but I'm quite proficient.

krahets commented 6 months ago

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

dxtym commented 6 months ago

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

Thank you! But I'm afraid the link to Discord above looks expired. Can you send it again, please?

krahets commented 6 months ago

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

Thank you! But I'm afraid the link to Discord above looks expired. Can you send it again, please?

Sorry for the inconvenience. Please try this link: https://discord.gg/nvspS56295

sheng0321 commented 6 months ago
  • A background in computer science, whether as a student, engineer, or researcher.

Hello, I hope I can help.

SHUANGBRO888 commented 5 months ago

Hello, I am willing to help.

frankliuao commented 3 months ago

Hey I'm happy to help too. I've been working in the U.S. since 2010 and I'm currently a TPM (Technical Project/Program Manager) at the University of Chicago. I'm working on SaaS products and I work closely with an awesome team of developers, platform engineers, and SecOps. I used to be a developer myself and I am still interested in doing it regularly nowadays. I can do manual translations, proofreads, ChatGPT + Gemini + Meta AI translation comparison, etc. I admire and strongly respect your work, @krahets . This is a great project and I am happy to help in any way.

krahets commented 3 months ago

@frankliuao Thanks for your interest and kind words about this book! Could you contact me via WeChat or Discord to discuss the details further?

MENG2010 commented 1 month ago

Hi Krahets (@krahets),

Thank you for this awesome project. I'm happy to help too.

I've been studying in the US since 2015 and I'm currently a Ph.D. student in Computer Science at George Mason University. My research interests include trustworthy machine learning and software security. I can assist with manual translations, proofreading, and AI translation comparison (using ChatGPT and ClaudeAI).

Thank you for your time.

krahets commented 1 month ago

Hi, @MENG2010, thanks for your interest! Welcome to join us! Could you please add my WeChat: krahets-jyd?

umer77jahangir commented 1 month ago

hello! Thank for this opportunity to collaborate with you guys. I am a third-year computer science bachelor's student. Although English is not my first language, I am proficient in it. In addition, I work as a front-end developer and content writer, having written several articles for other people's blogs. I believe my skills will be useful for this project. In git hub, I also provide a leetcode-feedback. There is a problem with the previous links, so if you are interested, please send an invite link.

tinatsina commented 1 month ago

Hello 👋 My name is Tinaye (天籁). I am an Embedded Engineer here in China, and I work on stuff like this often. Mostly translating and improving Datasheets and Register Map documents. I am fluent in English, and my Chinese is "okay" 😓 .

Hope to join you on this project as well. I can assist with proofreading and fine-tuning text to be more in line with standard documentation.

Huilin-Li commented 1 month ago

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

Thank you! But I'm afraid the link to Discord above looks expired. Can you send it again, please?

Sorry for the inconvenience. Please try this link: https://discord.gg/nvspS56295

Hi, discord link expired again.

krahets commented 1 month ago

@Huilin-Li Updated

krahets commented 1 month ago

Welcome @tinatsina @umer77jahangir! Do you use WeChat? If so, please add me krahets-jyd.

umer77jahangir commented 1 month ago

no i did not use it. Please send a link which is not expired

vampirepapi commented 1 month ago

Hey! I don't know Chinese, but can I still contribute to this? I know English and have a good grasp of CS.

ofou commented 1 month ago

Is there any way to keep track of the translation status? :eyes:

I used this script to quickly compare the number of lines in each file under the 'docs' directory for both the Chinese and English versions of the content. While line count is not an ideal metric for translation progress, it provides a rough estimate of differences between the two versions.

#!/bin/bash

# Function to get file counts
get_file_counts() {
    cd "$1" || exit
    find . -name "*.md" -print0 | xargs -0 wc -l | sort -n
    cd - > /dev/null || exit
}

# Get counts for both directories
en_counts=$(get_file_counts "./en/docs")
zh_counts=$(get_file_counts "./docs")

# Combine and format as a markdown table, showing only differences
echo "| File | ZH Lines | EN Lines | Difference |"
echo "| ---- | -------- | -------- | ---------- |"

awk '
BEGIN {FS="\n"; RS=""}
{
    for (i=1; i<=NF; i++) {
        split($i, a, " ")
        file = a[2]
        sub(/^\.\//, "", file)
        if (NR == 1) {
            zh_files[file] = a[1]
        } else {
            en_files[file] = a[1]
        }
    }
}
END {
    for (file in zh_files) {
        if (file in en_files) {
            diff = zh_files[file] - en_files[file]
            if (diff != 0) {
                printf "| %s | %s | %s | %d |\n", file, zh_files[file], en_files[file], diff
            }
        } else {
            printf "| %s | %s | - | %s |\n", file, zh_files[file], zh_files[file]
        }
    }
    for (file in en_files) {
        if (!(file in zh_files)) {
            printf "| %s | - | %s | -%s |\n", file, en_files[file], en_files[file]
        }
    }
}
' <(echo "$zh_counts") <(echo "$en_counts") | sort -t '|' -k5 -n

# Print totals
zh_total=$(echo "$zh_counts" | tail -n 1 | awk '{print $1}')
en_total=$(echo "$en_counts" | tail -n 1 | awk '{print $1}')
total_diff=$((zh_total - en_total))

The result of that script is a markdown table like this:

File ZH Lines EN Lines Difference
chapter_array_and_linkedlist/summary.md 76 81 -5
chapter_computational_complexity/performance_evaluation.md 49 48 1
chapter_data_structure/summary.md 66 65 1
chapter_tree/array_representation_of_tree.md 166 164 2
chapter_graph/graph_traversal.md 140 136 4
chapter_data_structure/basic_data_types.md 181 170 11
chapter_tree/avl_tree.md 364 353 11
chapter_introduction/summary.md 22 9 13
chapter_preface/suggestions.md 252 239 13
chapter_array_and_linkedlist/array.md 235 221 14
chapter_backtracking/backtracking_algorithm.md 509 489 20
chapter_tree/binary_tree.md 688 662 26
chapter_stack_and_queue/stack.md 436 389 47
chapter_stack_and_queue/queue.md 429 381 48
chapter_hashing/hash_algorithm.md 416 366 50
chapter_stack_and_queue/deque.md 458 405 53
chapter_hashing/hash_map.md 603 537 66
chapter_paperbook/index.md 68 -[^1] 68
chapter_array_and_linkedlist/linked_list.md 761 686 75
chapter_computational_complexity/space_complexity.md 898 803 95
chapter_computational_complexity/time_complexity.md 1224 1112 112
chapter_array_and_linkedlist/list.md 1034 906 128
total 14570 13717 853

[^1]: These checks will output rows for any files that exist in one language version but not in the other. The output will show "-" in the column for the language where the file is missing.

Then we can inspect by file using something like this:

diff \
  --width="$COLUMNS" \
  --side-by-side \
  --color=always \
  --expand-tabs \
  en/docs/chapter_array_and_linkedlist/summary.md \
  docs/chapter_array_and_linkedlist/summary.md 

Now we can get to see were the diff is between both versions

# Summary                                               |  # 小结

### Key review                                          |  ### 重点回顾

- Arrays and linked lists are two basic data structures |  - 数组和链表是两种基本的数据结构,分�
- Arrays support random access and use less memory; how |  - 数组支持随机访问、占用内存较少;但�
- Linked lists implement efficient node insertion and d |  - 链表通过更改引用(指针)实现高效的�
- Common types of linked lists include singly linked li |  - 列表是一种支持增删查改的元素有序集�
- Lists are ordered collections of elements that suppor |  - 列表的出现大幅提高了数组的实用性,�
- The advent of lists significantly enhanced the practi |  - 程序运行时,数据主要存储在内存中。�
- During program execution, data is mainly stored in me |  - 缓存通过缓存行、预取机制以及空间局�
- Caches provide fast data access to CPUs through mecha |  - 由于数组具有更高的缓存命中率,因此�
- Due to higher cache hit rates, arrays are generally m <  

### Q & A                                                  ### Q & A

**Q**: Does storing arrays on the stack versus the heap |  **Q**:数组存储在栈上和存储在堆上,对�

Arrays stored on both the stack and heap are stored in  |  存储在栈上和堆上的数组都被存储在连续�

1. Allocation and release efficiency: The stack is a sm |  1. 分配和释放效率:栈是一块较小的内存�
2. Size limitation: Stack memory is relatively small, w |  2. 大小限制:栈内存相对较小,堆的大小�
3. Flexibility: The size of arrays on the stack needs t |  3. 灵活性:栈上的数组的大小需要在编译�

**Q**: Why do arrays require elements of the same type, |  **Q**:为什么数组要求相同类型的元素,�

Linked lists consist of nodes connected by references ( |  链表由节点组成,节点之间通过引用(指�

In contrast, array elements must be of the same type, a |  相对地,数组元素则必须是相同类型的,�

```shell                                                   ```shell
# Element memory address = array memory address + eleme |  # 元素内存地址 = 数组内存地址(首元素�
```                                                        ```

**Q**: After deleting a node, is it necessary to set `P |  **Q**:删除节点 `P` 后,是否需要把 `P.next`

Not modifying `P.next` is also acceptable. From the per |  不修改 `P.next` 也可以。从该链表的角度看

From a garbage collection perspective, for languages wi |  从数据结构与算法(做题)的角度看,不�

**Q**: In linked lists, the time complexity for inserti |  **Q**:在链表中插入和删除操作的时间复�

If an element is searched first and then deleted, the t |  如果是先查找元素、再删除元素,时间复�

**Q**: In the figure "Linked List Definition and Storag |  **Q**:图“链表定义与存储方式”中,浅�

The figure is just a qualitative representation; quanti |  该示意图只是定性表示,定量表示需要根�

- Different types of node values occupy different amoun |  - 不同类型的节点值占用的空间是不同的�
- The memory space occupied by pointer variables depend |  - 指针变量占用的内存空间大小根据所使�

**Q**: Is adding elements to the end of a list always ` |  **Q**:在列表末尾添加元素是否时时刻刻�

If adding an element exceeds the list length, the list  |  如果添加元素时超出列表长度,则需要先�

**Q**: The statement "The emergence of lists greatly im |  **Q**:“列表的出现极大地提高了数组的�

The space wastage here mainly refers to two aspects: on |  这里的空间浪费主要有两方面含义:一方�

**Q**: In Python, after initializing `n = [1, 2, 3]`, t |  **Q**:在 Python 中初始化 `n = [1, 2, 3]` 后,�

If we replace list elements with linked list nodes `n = |  假如把列表元素换成链表节点 `n = [n1, n2, n

Unlike many languages, in Python, numbers are also wrap |  与许多语言不同,Python 中的数字也被包装

**Q**: The `std::list` in C++ STL has already implement |  **Q**:C++ STL 里面的 `std::list` 已经实现了�

On the one hand, we often prefer to use arrays to imple |  一方面,我们往往更青睐使用数组实现算�

- Space overhead: Since each element requires two addit |  - 空间开销:由于每个元素需要两个额外�
- Cache unfriendly: As the data is not stored continuou |  - 缓存不友好:由于数据不是连续存放的�

On the other hand, linked lists are primarily necessary |  另一方面,必要使用链表的情况主要是二�

**Q**: Does initializing a list `res = [0] * self.size( |  **Q**:初始化列表 `res = [0] * self.size()` 操�

No. However, this issue arises with two-dimensional arr |  不会。但二维数组会有这个问题,例如初�
                                                        <  
**Q**: In deleting a node, is it necessary to break the <  
                                                        <  
From the perspective of data structures and algorithms  <  

In my humble opinion, it makes sense to have a side-by-side translation (with an equal number of lines) because of the significant differences between 普通话 and English.

By the way, excellent book! I can't wait to read the English PDF/EPUB3 ASAP!