intelligent-agent / Refactor

Linux distro for 3D-printers
https://wiki.iagent.no/wiki/Refactor
GNU Affero General Public License v3.0
28 stars 19 forks source link

Timer too close error during print. #282

Closed eliasbakken closed 2 years ago

eliasbakken commented 2 years ago

Sometimes the print stops with the following message:

MCU 'ar100' shutdown: Timer too close
This often indicates the host computer is overloaded. Check
for other processes consuming excessive CPU time, high swap
usage, disk errors, overheating, unstable voltage, or
similar system problems on the host computer.

Here is an extraction from a typical settings:

Stats 3272.1: gcodein=92012 mcu: mcu_awake=0.005 mcu_task_avg=0.000029 mcu_task_stddev=0.000016 bytes_write=14638 bytes_read=357477 bytes_retransmit=9 bytes_invalid=0 send_seq=1570 receive_seq=1570 retransmit_seq=2 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 stalled_bytes=0 freq=48133349 ar100: mcu_awake=0.064 mcu_task_avg=0.000004 mcu_task_stddev=0.000008 bytes_write=940386 bytes_read=403233 bytes_retransmit=9 bytes_invalid=0 send_seq=50047 receive_seq=50047 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 stalled_bytes=1573 freq=300000127 adj=299180297 heater_bed: target=50 temp=49.8 pwm=0.206 board: temp=55.5 cold_junction: temp=37.7 voltage: temp=11.5 current: temp=1.2 sysload=0.09 cputime=96.701 memavail=597280 print_time=1001.695 buffer_time=1.993 print_stall=0 extruder: target=220 temp=220.0 pwm=0.434 extruder1: target=0 temp=20.7 pwm=0.000

I need help decifering what the issue could be. Nothing looks excessive at first glance.

eliasbakken commented 2 years ago

Running stress --cpu 4 does not seem to affect the issue, so it might be misleading regarding the message that it is the host computer that is at fault.

eliasbakken commented 2 years ago

Running the print at 1000% did not really affect the issue either. It seems more likely to be a bug in the code somewhere. Possibly due to the optimizations that have been committed lately.

eliasbakken commented 2 years ago

It seems that the 2K dynpool might be the cause of the error. The standard implementation has 20 K. The dynpool is used for reserving memory for path segments, so a small dynpool will cause the number of path segments ready to use to be very small.

There was not really any room to expand the dynpool and also fit bl31 in the SRAM A2, so instead now the ar100/klipper binary is allowed to use 48 KB and then bl31 is written back in SRAM A2 when klipper is shut down. This should allow a graceful shutdown and reboot. This is a big change that allows more freedom in future firmware changes.

eliasbakken commented 2 years ago

With the latest PR this seems to be solved. I'm running and print and marking it as solved if the print is OK. We can reopen if the problem persists.