Open iskyo0ps opened 2 months ago
When your Ubuntu system hangs while running TVM's AutoTVM, it can be challenging to determine the exact cause. However, there are several steps you can take to diagnose and potentially resolve the issue. Here are some steps to help you troubleshoot:
High resource usage (CPU, memory, disk I/O) can cause the system to hang. Use monitoring tools to check the system's resource usage.
top
.Install these tools if you don't have them:
sudo apt-get install htop sysstat
htop
in a terminal to see real-time CPU, memory, and process information.iostat -x 1
to see detailed disk I/O statistics.vmstat 1
to see system performance metrics.free -h
to see memory usage.System logs can provide valuable information about what might be causing the hang.
tail -f /var/log/syslog
tail -f /var/log/kern.log
dmesg | tail
TVM provides some debugging tools that can help you understand what's happening during the AutoTVM process.
Set the logging level to DEBUG to get more detailed output from TVM:
import logging
logging.basicConfig(level=logging.DEBUG)
If the issue is due to high resource usage, you can limit the resources used by AutoTVM.
Reduce the number of trials in your tuning options:
tuning_option = {
'log_filename': 'tuning.log',
'tuner': 'xgb',
'n_trial': 100, # Reduce the number of trials
'early_stopping': 50,
'measure_option': autotvm.measure_option(
builder=autotvm.LocalBuilder(),
runner=autotvm.LocalRunner(number=10, repeat=1, min_repeat_ms=1000),
),
}
Use the taskset
command to limit the CPU cores used by the process:
taskset -c 0-3 python your_script.py
This command will limit the script to use only the first four CPU cores.
If your system runs out of memory, it can hang. Adding a swap file can help mitigate this issue.
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Add the following line to /etc/fstab
:
/swapfile none swap sw 0 0
Use profiling tools to identify bottlenecks in your code.
import cProfile
cProfile.run('your_function()')
Install line_profiler:
pip install line_profiler
Add the @profile
decorator to the functions you want to profile and run the script with kernprof
:
kernprof -l -v your_script.py
Sometimes hardware issues can cause the system to hang. Check for hardware errors using tools like smartctl
for disk health and memtest86+
for memory tests.
sudo apt-get install smartmontools
sudo smartctl -a /dev/sda
Reboot your system and select the memory test option from the GRUB menu.
htop
, iostat
, vmstat
, and free
.smartctl
and memtest86+
.By following these steps, you should be able to diagnose and potentially resolve the issue causing your system to hang while running TVM's AutoTVM.
taskset -c 0-3 python your_script.py not work
Aug 23 19:48:42 loris systemd[1]: Finished Record System Boot/Shutdown in UTMP. Aug 23 19:48:42 loris systemd[1]: Started Network Time Synchronization. Aug 23 19:48:42 loris systemd[1]: Started Userspace Out-Of-Memory (OOM) Killer. Aug 23 19:48:42 loris systemd[1]: Reached target System Time Set.