iskyo0ps / compiler101_gpt_generated

Apache License 2.0
0 stars 0 forks source link

SWDEV-00004 - system hung when tvm tuning #8

Open iskyo0ps opened 2 months ago

iskyo0ps commented 2 months ago

Aug 23 19:48:42 loris systemd[1]: Finished Record System Boot/Shutdown in UTMP. Aug 23 19:48:42 loris systemd[1]: Started Network Time Synchronization. Aug 23 19:48:42 loris systemd[1]: Started Userspace Out-Of-Memory (OOM) Killer. Aug 23 19:48:42 loris systemd[1]: Reached target System Time Set.

iskyo0ps commented 2 months ago

When your Ubuntu system hangs while running TVM's AutoTVM, it can be challenging to determine the exact cause. However, there are several steps you can take to diagnose and potentially resolve the issue. Here are some steps to help you troubleshoot:

1. Monitor System Resources

High resource usage (CPU, memory, disk I/O) can cause the system to hang. Use monitoring tools to check the system's resource usage.

Tools to Use:

Install these tools if you don't have them:

sudo apt-get install htop sysstat

Example Usage:

2. Check System Logs

System logs can provide valuable information about what might be causing the hang.

Logs to Check:

Example Commands:

tail -f /var/log/syslog
tail -f /var/log/kern.log
dmesg | tail

3. Use TVM's Debugging Tools

TVM provides some debugging tools that can help you understand what's happening during the AutoTVM process.

Enable Debug Logging:

Set the logging level to DEBUG to get more detailed output from TVM:

import logging
logging.basicConfig(level=logging.DEBUG)

4. Limit Resource Usage

If the issue is due to high resource usage, you can limit the resources used by AutoTVM.

Limit the Number of Trials:

Reduce the number of trials in your tuning options:

tuning_option = {
    'log_filename': 'tuning.log',
    'tuner': 'xgb',
    'n_trial': 100,  # Reduce the number of trials
    'early_stopping': 50,
    'measure_option': autotvm.measure_option(
        builder=autotvm.LocalBuilder(),
        runner=autotvm.LocalRunner(number=10, repeat=1, min_repeat_ms=1000),
    ),
}

Limit CPU Usage:

Use the taskset command to limit the CPU cores used by the process:

taskset -c 0-3 python your_script.py

This command will limit the script to use only the first four CPU cores.

5. Use a Swap File

If your system runs out of memory, it can hang. Adding a swap file can help mitigate this issue.

Create a Swap File:

sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Make the Swap File Permanent:

Add the following line to /etc/fstab:

/swapfile none swap sw 0 0

6. Profile the Code

Use profiling tools to identify bottlenecks in your code.

Use cProfile:

import cProfile
cProfile.run('your_function()')

Use line_profiler:

Install line_profiler:

pip install line_profiler

Add the @profile decorator to the functions you want to profile and run the script with kernprof:

kernprof -l -v your_script.py

7. Check for Hardware Issues

Sometimes hardware issues can cause the system to hang. Check for hardware errors using tools like smartctl for disk health and memtest86+ for memory tests.

Check Disk Health:

sudo apt-get install smartmontools
sudo smartctl -a /dev/sda

Run Memory Test:

Reboot your system and select the memory test option from the GRUB menu.

Summary

  1. Monitor system resources using tools like htop, iostat, vmstat, and free.
  2. Check system logs for any errors or warnings.
  3. Enable debug logging in TVM to get more detailed output.
  4. Limit resource usage by reducing the number of trials and limiting CPU usage.
  5. Use a swap file to prevent out-of-memory issues.
  6. Profile the code to identify bottlenecks.
  7. Check for hardware issues using tools like smartctl and memtest86+.

By following these steps, you should be able to diagnose and potentially resolve the issue causing your system to hang while running TVM's AutoTVM.

iskyo0ps commented 2 months ago

taskset -c 0-3 python your_script.py not work

iskyo0ps commented 2 months ago