krahets commented 9 months ago

Contributing guidelines for Chinese-to-English

We are working on translating "Hello Algo" from Chinese to English with the following approach:

AI translation: Carry out an initial pass of translations using the machine learning translator.
Human optimization: Manually refine the machine-generated outputs to ensure authenticity and accuracy.
Pull request review: The optimized translation will be doubly checked by the reviewers through GitHub pull request workflow.
Repeat steps 2. and 3. for further improvements.

Join us

We're seeking contributors who meet the following criteria.

Technical background: Strong foundation in computer science, particularly in data structures and algorithms.
Language skills: Native proficiency in Chinese with professional-level English, or native English.
Available time: Dedicated to contributing to open-source projects with a willingness to engage in long-term translation efforts.

That is, our contributors are computer scientists, engineers, and students from different linguistic backgrounds, and their objectives have different focal points:

Native Chinese with professional working English: Ensuring translation accuracy and consistency between CN and EN versions.
Native English: Enhance the authenticity and fluency of the English content to read natural and engaging.

Don't hesitate to join us via WeChat krahets-jyd or on Discord!

Contributing guideline

Please visit en/CONTRIBUTING.md for more details.

krahets commented 9 months ago

Check out the following PR for more clarity on the workflow:

dxtym commented 6 months ago

Hi! I don't know Chinese, but can I contribute to this? I have formal background in CS. Thank you!

krahets commented 6 months ago

Hi! I don't know Chinese, but can I contribute to this? I have formal background in CS. Thank you!

Welcome! I think you can engage in the PR reviewing, focusing on optimizing fluency and authenticity, if you’re proficient in English. Is English your first language?

dxtym commented 6 months ago

Hi! I don't know Chinese, but can I contribute to this? I have formal background in CS. Thank you!

Welcome! I think you can engage in the PR reviewing, focusing on optimizing fluency and autheticy, if you’re proficient in English. Is English your first language?

Not really, but I'm quite proficient.

krahets commented 6 months ago

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

dxtym commented 6 months ago

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

Thank you! But I'm afraid the link to Discord above looks expired. Can you send it again, please?

krahets commented 6 months ago

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

Thank you! But I'm afraid the link to Discord above looks expired. Can you send it again, please?

Sorry for the inconvenience. Please try this link: https://discord.gg/nvspS56295

sheng0321 commented 6 months ago

A background in computer science, whether as a student, engineer, or researcher.

Hello, I hope I can help.

SHUANGBRO888 commented 5 months ago

Hello, I am willing to help.

frankliuao commented 3 months ago

Hey I'm happy to help too. I've been working in the U.S. since 2010 and I'm currently a TPM (Technical Project/Program Manager) at the University of Chicago. I'm working on SaaS products and I work closely with an awesome team of developers, platform engineers, and SecOps. I used to be a developer myself and I am still interested in doing it regularly nowadays. I can do manual translations, proofreads, ChatGPT + Gemini + Meta AI translation comparison, etc. I admire and strongly respect your work, @krahets . This is a great project and I am happy to help in any way.

krahets commented 3 months ago

@frankliuao Thanks for your interest and kind words about this book! Could you contact me via WeChat or Discord to discuss the details further?

MENG2010 commented 1 month ago

Hi Krahets (@krahets),

Thank you for this awesome project. I'm happy to help too.

I've been studying in the US since 2015 and I'm currently a Ph.D. student in Computer Science at George Mason University. My research interests include trustworthy machine learning and software security. I can assist with manual translations, proofreading, and AI translation comparison (using ChatGPT and ClaudeAI).

Thank you for your time.

krahets commented 1 month ago

Hi, @MENG2010, thanks for your interest! Welcome to join us! Could you please add my WeChat: krahets-jyd?

umer77jahangir commented 1 month ago

hello! Thank for this opportunity to collaborate with you guys. I am a third-year computer science bachelor's student. Although English is not my first language, I am proficient in it. In addition, I work as a front-end developer and content writer, having written several articles for other people's blogs. I believe my skills will be useful for this project. In git hub, I also provide a leetcode-feedback. There is a problem with the previous links, so if you are interested, please send an invite link.

tinatsina commented 1 month ago

Hello 👋 My name is Tinaye (天籁). I am an Embedded Engineer here in China, and I work on stuff like this often. Mostly translating and improving Datasheets and Register Map documents. I am fluent in English, and my Chinese is "okay" 😓 .

Hope to join you on this project as well. I can assist with proofreading and fine-tuning text to be more in line with standard documentation.

Huilin-Li commented 1 month ago

@thisisdilmurod Great! Please add my WeChat: krahets-jyd (if you use it) and join us on Discord

Thank you! But I'm afraid the link to Discord above looks expired. Can you send it again, please?

Sorry for the inconvenience. Please try this link: https://discord.gg/nvspS56295

Hi, discord link expired again.

krahets commented 1 month ago

@Huilin-Li Updated

krahets commented 1 month ago

Welcome @tinatsina @umer77jahangir! Do you use WeChat? If so, please add me krahets-jyd.

umer77jahangir commented 1 month ago

no i did not use it. Please send a link which is not expired

vampirepapi commented 1 month ago

Hey! I don't know Chinese, but can I still contribute to this? I know English and have a good grasp of CS.

ofou commented 1 month ago

Is there any way to keep track of the translation status? :eyes:

I used this script to quickly compare the number of lines in each file under the 'docs' directory for both the Chinese and English versions of the content. While line count is not an ideal metric for translation progress, it provides a rough estimate of differences between the two versions.

#!/bin/bash

# Function to get file counts
get_file_counts() {
    cd "$1" || exit
    find . -name "*.md" -print0 | xargs -0 wc -l | sort -n
    cd - > /dev/null || exit
}

# Get counts for both directories
en_counts=$(get_file_counts "./en/docs")
zh_counts=$(get_file_counts "./docs")

# Combine and format as a markdown table, showing only differences
echo "| File | ZH Lines | EN Lines | Difference |"
echo "| ---- | -------- | -------- | ---------- |"

awk '
BEGIN {FS="\n"; RS=""}
{
    for (i=1; i<=NF; i++) {
        split($i, a, " ")
        file = a[2]
        sub(/^\.\//, "", file)
        if (NR == 1) {
            zh_files[file] = a[1]
        } else {
            en_files[file] = a[1]
        }
    }
}
END {
    for (file in zh_files) {
        if (file in en_files) {
            diff = zh_files[file] - en_files[file]
            if (diff != 0) {
                printf "| %s | %s | %s | %d |\n", file, zh_files[file], en_files[file], diff
            }
        } else {
            printf "| %s | %s | - | %s |\n", file, zh_files[file], zh_files[file]
        }
    }
    for (file in en_files) {
        if (!(file in zh_files)) {
            printf "| %s | - | %s | -%s |\n", file, en_files[file], en_files[file]
        }
    }
}
' <(echo "$zh_counts") <(echo "$en_counts") | sort -t '|' -k5 -n

# Print totals
zh_total=$(echo "$zh_counts" | tail -n 1 | awk '{print $1}')
en_total=$(echo "$en_counts" | tail -n 1 | awk '{print $1}')
total_diff=$((zh_total - en_total))

The result of that script is a markdown table like this:

File	ZH Lines	EN Lines	Difference
chapter_array_and_linkedlist/summary.md	76	81	-5
chapter_computational_complexity/performance_evaluation.md	49	48	1
chapter_data_structure/summary.md	66	65	1
chapter_tree/array_representation_of_tree.md	166	164	2
chapter_graph/graph_traversal.md	140	136	4
chapter_data_structure/basic_data_types.md	181	170	11
chapter_tree/avl_tree.md	364	353	11
chapter_introduction/summary.md	22	9	13
chapter_preface/suggestions.md	252	239	13
chapter_array_and_linkedlist/array.md	235	221	14
chapter_backtracking/backtracking_algorithm.md	509	489	20
chapter_tree/binary_tree.md	688	662	26
chapter_stack_and_queue/stack.md	436	389	47
chapter_stack_and_queue/queue.md	429	381	48
chapter_hashing/hash_algorithm.md	416	366	50
chapter_stack_and_queue/deque.md	458	405	53
chapter_hashing/hash_map.md	603	537	66
chapter_paperbook/index.md	68	-[^1]	68
chapter_array_and_linkedlist/linked_list.md	761	686	75
chapter_computational_complexity/space_complexity.md	898	803	95
chapter_computational_complexity/time_complexity.md	1224	1112	112
chapter_array_and_linkedlist/list.md	1034	906	128
total	14570	13717	853

[^1]: These checks will output rows for any files that exist in one language version but not in the other. The output will show "-" in the column for the language where the file is missing.

Then we can inspect by file using something like this:

diff \
  --width="$COLUMNS" \
  --side-by-side \
  --color=always \
  --expand-tabs \
  en/docs/chapter_array_and_linkedlist/summary.md \
  docs/chapter_array_and_linkedlist/summary.md

Now we can get to see were the diff is between both versions

# Summary                                               |  # 小结

### Key review                                          |  ### 重点回顾

- Arrays and linked lists are two basic data structures |  - 数组和链表是两种基本的数据结构，分�
- Arrays support random access and use less memory; how |  - 数组支持随机访问、占用内存较少；但�
- Linked lists implement efficient node insertion and d |  - 链表通过更改引用（指针）实现高效的�
- Common types of linked lists include singly linked li |  - 列表是一种支持增删查改的元素有序集�
- Lists are ordered collections of elements that suppor |  - 列表的出现大幅提高了数组的实用性，�
- The advent of lists significantly enhanced the practi |  - 程序运行时，数据主要存储在内存中。�
- During program execution, data is mainly stored in me |  - 缓存通过缓存行、预取机制以及空间局�
- Caches provide fast data access to CPUs through mecha |  - 由于数组具有更高的缓存命中率，因此�
- Due to higher cache hit rates, arrays are generally m <  

### Q & A                                                  ### Q & A

**Q**: Does storing arrays on the stack versus the heap |  **Q**：数组存储在栈上和存储在堆上，对�

Arrays stored on both the stack and heap are stored in  |  存储在栈上和堆上的数组都被存储在连续�

1. Allocation and release efficiency: The stack is a sm |  1. 分配和释放效率：栈是一块较小的内存�
2. Size limitation: Stack memory is relatively small, w |  2. 大小限制：栈内存相对较小，堆的大小�
3. Flexibility: The size of arrays on the stack needs t |  3. 灵活性：栈上的数组的大小需要在编译�

**Q**: Why do arrays require elements of the same type, |  **Q**：为什么数组要求相同类型的元素，�

Linked lists consist of nodes connected by references ( |  链表由节点组成，节点之间通过引用（指�

In contrast, array elements must be of the same type, a |  相对地，数组元素则必须是相同类型的，�

```shell                                                   ```shell
# Element memory address = array memory address + eleme |  # 元素内存地址 = 数组内存地址（首元素�
```                                                        ```

**Q**: After deleting a node, is it necessary to set `P |  **Q**：删除节点 `P` 后，是否需要把 `P.next`

Not modifying `P.next` is also acceptable. From the per |  不修改 `P.next` 也可以。从该链表的角度看

From a garbage collection perspective, for languages wi |  从数据结构与算法（做题）的角度看，不�

**Q**: In linked lists, the time complexity for inserti |  **Q**：在链表中插入和删除操作的时间复�

If an element is searched first and then deleted, the t |  如果是先查找元素、再删除元素，时间复�

**Q**: In the figure "Linked List Definition and Storag |  **Q**：图“链表定义与存储方式”中，浅�

The figure is just a qualitative representation; quanti |  该示意图只是定性表示，定量表示需要根�

- Different types of node values occupy different amoun |  - 不同类型的节点值占用的空间是不同的�
- The memory space occupied by pointer variables depend |  - 指针变量占用的内存空间大小根据所使�

**Q**: Is adding elements to the end of a list always ` |  **Q**：在列表末尾添加元素是否时时刻刻�

If adding an element exceeds the list length, the list  |  如果添加元素时超出列表长度，则需要先�

**Q**: The statement "The emergence of lists greatly im |  **Q**：“列表的出现极大地提高了数组的�

The space wastage here mainly refers to two aspects: on |  这里的空间浪费主要有两方面含义：一方�

**Q**: In Python, after initializing `n = [1, 2, 3]`, t |  **Q**：在 Python 中初始化 `n = [1, 2, 3]` 后，�

If we replace list elements with linked list nodes `n = |  假如把列表元素换成链表节点 `n = [n1, n2, n

Unlike many languages, in Python, numbers are also wrap |  与许多语言不同，Python 中的数字也被包装

**Q**: The `std::list` in C++ STL has already implement |  **Q**：C++ STL 里面的 `std::list` 已经实现了�

On the one hand, we often prefer to use arrays to imple |  一方面，我们往往更青睐使用数组实现算�

- Space overhead: Since each element requires two addit |  - 空间开销：由于每个元素需要两个额外�
- Cache unfriendly: As the data is not stored continuou |  - 缓存不友好：由于数据不是连续存放的�

On the other hand, linked lists are primarily necessary |  另一方面，必要使用链表的情况主要是二�

**Q**: Does initializing a list `res = [0] * self.size( |  **Q**：初始化列表 `res = [0] * self.size()` 操�

No. However, this issue arises with two-dimensional arr |  不会。但二维数组会有这个问题，例如初�
                                                        <  
**Q**: In deleting a node, is it necessary to break the <  
                                                        <  
From the perspective of data structures and algorithms  <

In my humble opinion, it makes sense to have a side-by-side translation (with an equal number of lines) because of the significant differences between 普通话 and English.

By the way, excellent book! I can't wait to read the English PDF/EPUB3 ASAP!

krahets / hello-algo

Chinese-to-English (Help Wanted) #914

Contributing guidelines for Chinese-to-English

Join us

Contributing guideline