socialfoundations / tttlm

Test-time-training on nearest neighbors for large language models
MIT License
24 stars 4 forks source link

Processes Killed by OS Due to High Memory Usage When Starting Servers #1

Open sunxin000 opened 4 days ago

sunxin000 commented 4 days ago

Description:

I am running a script to start multiple server instances as shown below:

for i in {00..29}
do
    file="${i}.jsonl"
    python3 code/pile_server.py \
            --address_path servers/addresses.txt \
            --data_file "$file" \
            --num_servers 2 & 

    sleep 900

done

However, I have encountered an issue where some processes are being killed by the OS. I suspect this is due to Out of Memory (OOM) conditions because each server instance loads the portion of data and index it is responsible for into memory. Given that the data and index are quite large, this results in high memory usage.

Question:

How can I address this problem to prevent the processes from being killed due to high memory usage? Are there any strategies or optimizations I can apply to reduce the memory footprint of each server instance?

mrtzh commented 3 days ago

It looks like you're trying to run all servers on one machine. That would require about 1TB of memory for the index and another 1TB for the data. We ran 30 servers on 30 different machines, each one serving one chunk of the data. The code supports running everything in a distributed manner.