markniu / Bed_Distance_sensor

Auto bed level with High resolution distance sensor
281 stars 28 forks source link

Communication timeout during homing probe Z #74

Open shlo-mix opened 10 months ago

shlo-mix commented 10 months ago

Hi

So this wired bug is only happens while doing a probing operations with BDsensor not mesh or calibrations.

HW: Voron trident 350mm BTT M8P Klipper Version: v0.12.0-61-gb50d6669 BTT EBB36 1.2, Klipper Version: v0.12.0-61-gb50d6669, CAN communication M102 s-1 V1.1 pandapi3d.com

Calibration data: 977 964 ... 290 266 245 also M102 S-2 Sensor measures distance look accurate

Homing: Homing of X, Y is successful, when its start doing Z homing I get "Communication timeout during homing Z" :

8:58 PM Z axis triggered at 0.000 mm 8:58 PM warning:triggered at 0mm, Please slow down the homing z speed and the position_endstop in BDsensor >=0.5 8:58 PM triggered at 10.150 mm ! 8:58 PM Communication timeout during homing z

After this failure:

So its look like a bug that overload MCU preventing communication. klippy (14).log

If under printer.cfg I disable the "probe:z_virtual_endstop" and use the mechanical z endstop pin the homing of x,y,z is successful. also I can use the BDsensor to do Mesh via Hightmap tab and it's really nice to see mapping results ,communication is still good and fast, (bytes_retransmit =0, bytes_invalid = 0.)

But as soon that I try to do Z_TILT probing, I get communication issues: "Communication timeout during homing probe" followed by: Z axis triggered at 0.900 mm

Then again the com is very slow, "bytes_retransmit" is about 150.

Its seems that only probing operations with BDsensor causing mcu load preventing communication, only reset can solve it sometimes I need power-cycle

markniu commented 10 months ago

what's the CAN bus speed while make menuconfig? if it below 1000000 you can try to increase it.

shlo-mix commented 10 months ago

the Can is 1000000, no errors:

image

shlo-mix commented 10 months ago

Found a issue in the code that was part of the problem under BDsensor.py:
-# Probe position try: if ((self.mcu_probe.bd_sensor is not None) and (( "BED_MESH_CALIBRATE" in gcmd.get_command()) or
("QUAD_GANTRY_LEVEL" in gcmd.get_command()))):

was missing Z_TILT_ADJUST, I added it:

-# Probe position try: if ((self.mcu_probe.bd_sensor is not None) and (( "BED_MESH_CALIBRATE" in gcmd.get_command()) or ( "Z_TILT_ADJUST" in gcmd.get_command()) or
("QUAD_GANTRY_LEVEL" in gcmd.get_command()))):

This enabled me to do a Z_TILT_ADJUST "scan" :) then I needed to set horizontal_move_z high enough not to trigger : [z_tilt] horizontal_move_z: 2 ...

Now Z_TILT_ADJUST and BED_MESH_CALIBRATE working very good. Its not a complete solution, still must use a mechanical Z for homing, but the main important features for me are working.

Actually for me it better to use mechanical Z for homing as it safer, probing is not done above the bed and have another probe to validate result of the first one

shlo-mix commented 10 months ago

BTW, for I2C reading, I see that the code use while loop for waiting, is it passible to change it? maybe use timer interrupt.

markniu commented 10 months ago

Not sure, I need to think of that to see if it is possible.

BTW, for I2C reading, I see that the code use while loop for waiting, is it passible to change it? maybe use timer interrupt.

shlo-mix commented 10 months ago

Thank you for your great work, very cool product.

sarpel commented 10 months ago
19:15:46
1015
19:15:46
1015
19:15:46
956
19:15:46
956
19:15:46
956
19:15:45
956
19:15:45
949
19:15:45
937
19:15:45
925
19:15:45
909
19:15:45
894
19:15:45
879
19:15:45
864
19:15:45
845
19:15:45
827
19:15:44
810
19:15:44
791
19:15:44
768
19:15:44
748
19:15:44
724
19:15:44
701
19:15:44
678
19:15:44
652
19:15:44
625
19:15:44
599
19:15:44
571
19:15:44
544
19:15:44
513
19:15:44
485
19:15:44
454
19:15:44
423
19:15:44
392
19:15:44
362
19:15:44
330
19:15:44
300
19:15:44
269
19:15:44
240
19:15:44
211
19:15:44
182
19:15:44
157
19:15:44
BDSENSOR_READ_CALIBRATION

19:18:44
Z axis triggered at 0.000 mm
19:18:44
warning:triggered at 0mm, Please slow down the homing z speed and the position_endstop in BDsensor >=0.5
19:18:43
triggered at 10.150 mm !
19:18:42
Communication timeout during homing z
19:18:31
G28
[BDsensor] 
#sda_pin: PB1
#scl_pin: PB0 
scl_pin: EBBCan: PB9
sda_pin: EBBCan: PB8
delay: 20 ###Tried Variations 20-30###
z_offset:0
z_adjust: 0
x_offset: 0
y_offset: 0 ###Tried Variations, My actual was 20. even tho unrelated###
no_stop_probe: true ###Tried Variations true or false or just '#' in front###
position_endstop: 1.0 ###Tried Variations 0.5-2.0### 
speed:0.8 ###Tried Variations 0.3-1###

[safe_z_home]
home_xy_position: 150, 115
speed: 200.0
z_hop: 1 ###Tried variations from 1 to 5###
z_hop_speed: 15.0 ###Tried Variations 1-15 even not related i guess###
#move_to_previous: False ###Tried Variations or just '#' in front, unrelated###

[bed_mesh]
speed: 200
horizontal_move_z: 1 ###Tried variations from 1 to 5###
mesh_min: 40, 40
mesh_max: 260,260
#probe_count: 25,25 ###Tried variations, but meshing is working the problem is homing###
algorithm: bicubic
fade_start: 0.6 ###Tried Variations or just '#' in front, unrelated###
fade_end: 10.0 ###Tried Variations or just '#' in front, unrelated###
bicubic_tension: 0.2 ###Tried Variations or just '#' in front, unrelated###
zero_reference_position: 150,115 ###Tried Variations or just '#' in front, unrelated###

[quad_gantry_level]
gantry_corners:
    -60,-10
    360,370
points:
    70,40
    70,210
    240,210
    240,40
speed: 200
horizontal_move_z: 1 ###Tried variations from 1 to 5###
retries: 5
retry_tolerance: 0.05
max_adjust: 10

[force_move]
enable_force_move: True

[stepper_z]
step_pin: PD14
dir_pin: !PD13
enable_pin: !PD15
rotation_distance: 40
gear_ratio: 80:16
microsteps: 16
endstop_pin: probe:z_virtual_endstop
position_max: 260
position_min: -10 ###Tried variations -10 to +1###
#position_endstop: 0
#homing_speed: 10 #pcb klicky
homing_speed: 3 ###Tried variations 3 to 0.5###
#second_homing_speed: 2 #pcb klicky
second_homing_speed: 0.8 ###Tried variations 0.8 to 0.3###
homing_retract_speed: 3.5 ###Tried variations also disabled in some variations###
#homing_retract_dist: 3 #klicky
homing_retract_dist: 1 ###Tried variations also disabled in some variations###

Then it all comes to this. Somehow ebb36 cant keep up with incoming data. My CAN speed is 1,000,000 and also CGL and bed meshing works which i assume involves more data than homing.

Stats 3245.8: gcodein=0  mcu: mcu_awake=0.001 mcu_task_avg=0.000005 mcu_task_stddev=0.000004 bytes_write=5418 bytes_read=10476 bytes_retransmit=9 bytes_invalid=0 send_seq=411 receive_seq=411 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=180002956 EBBCan: mcu_awake=0.002 mcu_task_avg=0.000014 mcu_task_stddev=0.000011 bytes_write=1512 bytes_read=6577 bytes_retransmit=0 bytes_invalid=0 send_seq=162 receive_seq=162 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63999352 adj=63998161  EBBCan: temp=26.1 heater_bed: target=0 temp=19.5 pwm=0.000 Raspberry_Pi: temp=34.6 Spider: temp=26.1 chamber: temp=21.0 sysload=0.91 cputime=71.510 memavail=1321232 print_time=25.474 buffer_time=0.000 print_stall=0 extruder: target=0 temp=18.9 pwm=0.000
Communication timeout during homing z
Stats 3246.9: gcodein=0  mcu: mcu_awake=0.004 mcu_task_avg=0.000012 mcu_task_stddev=0.000012 bytes_write=7238 bytes_read=12359 bytes_retransmit=9 bytes_invalid=0 send_seq=516 receive_seq=516 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=16 freq=180002920 EBBCan: mcu_awake=0.002 mcu_task_avg=0.000014 mcu_task_stddev=0.000011 bytes_write=1679 bytes_read=6936 bytes_retransmit=81 bytes_invalid=0 send_seq=177 receive_seq=176 retransmit_seq=175 srtt=0.004 rttvar=0.004 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63999320 adj=63998134  EBBCan: temp=26.3 heater_bed: target=0 temp=19.5 pwm=0.000 Raspberry_Pi: temp=34.6 Spider: temp=26.0 chamber: temp=21.0 sysload=0.91 cputime=71.618 memavail=1316412 print_time=26.065 buffer_time=0.043 print_stall=0 extruder: target=0 temp=18.9 pwm=0.000
triggered at 10.150 mm !
warning:triggered at 0mm, Please slow down the homing z speed and the position_endstop in BDsensor >=0.5 
Z axis triggered at 0.000 mm 
Stats 3248.6: gcodein=0  mcu: mcu_awake=0.004 mcu_task_avg=0.000012 mcu_task_stddev=0.000012 bytes_write=7373 bytes_read=12736 bytes_retransmit=9 bytes_invalid=0 send_seq=526 receive_seq=526 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=180002920 EBBCan: mcu_awake=0.002 mcu_task_avg=0.000014 mcu_task_stddev=0.000011 bytes_write=1719 bytes_read=7148 bytes_retransmit=122 bytes_invalid=0 send_seq=181 receive_seq=181 retransmit_seq=181 srtt=0.003 rttvar=0.003 rto=0.050 ready_bytes=0 upcoming_bytes=0 freq=63999320 adj=63998075  EBBCan: temp=26.3 heater_bed: target=0 temp=19.4 pwm=0.000 Raspberry_Pi: temp=34.1 Spider: temp=25.9 chamber: temp=20.9 sysload=1.00 cputime=71.639 memavail=1313976 print_time=26.065 buffer_time=0.000 print_stall=0 extruder: target=0 temp=18.9 pwm=0.000

Hardware: Voron 2.4 Rpi4 Fystec Spider 2.3 EBB36 U2C 2x Mellow 5160 HV @48V 4x Fysetc 2209 @ 24V Goliath + vz-hex cnc + moons 8t

klippy (76).log

sarpel commented 10 months ago

additional info,

I tried beta version and recompile reflash ebb. then it worked. then after firmware restart it started to fail again. on the same session at putty, i reflashed already existing klipper firmware(with bdsensor), and it worked again.

I am sure after a firmware restart it will fail again.

Does this kind of an error ring a bell?

markniu commented 10 months ago

how about decrease the delay value from 20 to 19 18 17 ? but need to make sure the data from bdsensor is right e.g M102 S-1

markniu commented 10 months ago

BTW, I also have met this timeout problem with EBB36 in random even without BDsensor before. I do believe this problem is caused by the busy of process by the application software in the PI or in the MCU

shlo-mix commented 10 months ago

What is the reason that EBB36 keeping up in mesh but not while probing? Is it possible to relax the BD sensor probing CPU usage? (or use it the same way as in "Mesh")

davidsiaw commented 10 months ago

I am also receiving communication timeout from homing probe. Even if I adjust the trsync_timeout to a higher value it still times out. Its impossible to home or QGL

sarpel commented 10 months ago

how about decrease the delay value from 20 to 19 18 17 ? but need to make sure the data from bdsensor is right e.g M102 S-1

Decreasing values makes no difference. They are v1.2 panda something even at 10 delay. But whatever i tried, i couldnt make it to home besides new upload over katapult session i mentioned.

I looked for this error on klipper discourse forums. There were some type of consensus about it being caused by non 32bit OS, timer timeout to 0.05 and some managed to change SOC and worked. The last guy says he owns a printer farm and pi4s give a lot of timeout errors, so he changed them with some kind of intel soc or laptop and them all got fixed. He also mentioned after unplugging usbs (cam, adxl etc) also fixes the problem. Conclusion was pi4 cant handle usb buses or something.

Anyway, i tried usb unplugging, minimalizing components and timeout thing. Them all are impossible for this sensor to not timeout. Because just in the first command, ebb36 yields. Sudden 170ish retransmits. Clock sync disrupts waaayy above threshold of 0.05 or similar timeout registers.

I mean probe works bed meshing and qgl but it just cant home even once without timeouting.

Changed back to klicky pcb.

@markniu isnt there some way to slow homeing procedure with somekind of delay between those messages? I am sure more and more will come for this problem. I just dont know what differs us from already working people setup wise

sarpel commented 10 months ago

@markniu there is a klipper update about multi mcu homing and time sync issues and timeouts. Can you check if it is tailored for us. the exact moment we need. Did koconnor saw us somehow?

markniu commented 10 months ago

Thank you for letting me know that. I think that update can fix the timeout problem in most of the case, I have also sync that code into the BDsensor.py. please update the BDsensor.py and klipper, try again.

@markniu there is a klipper update about multi mcu homing and time sync issues and timeouts. Can you check if it is tailored for us. the exact moment we need. Did koconnor saw us somehow?

markniu commented 10 months ago

For homing with BDsensor, the TRSYNC_TIMEOUT = 0.025 is setted in BDsensor.py not mcu.py.

I looked for this error on klipper discourse forums. There were some type of consensus about it being caused by non 32bit OS, timer timeout to 0.05 and some managed to change SOC and worked.

shlo-mix commented 10 months ago

Setting the delay to 13 helped a bit (12 didn't work) but most of the time I get the same timeouts.

For Sarpel and me "mesh" is working, it there a possibility to have temporary workaround for probing by changing something in the code so "probe reading" use the same "mesh reading code"? If you point me to the line in the code I can try, PROBE accuracy and sample rate is not important as long we do a mesh afterword.

BTW Even if I use my solution of Z mech end-stop, doing QGL/TILT must be from safe distance (>10mm), sometimes the bed is tilted by gravity, if doing meshing/Tilt from 2mm, I may crash into bed - learned it in the hard way.

markniu commented 10 months ago

you can try to update the BDsensor.py and the klipper, I believe that can fix the timeout problem, no need to reflash the mcu. if still has random timeout you can increase the TRSYNC_TIMEOUT from 0.025 to 0.05 in the BDsensor.py not mcy.py

if you want to do the QGL like the normal probe sensor, you can delete this selected code as shown bellow. /home/pi/Bed_Distance_sensor/klipper/BDsensor.py

image

Setting the delay to 13 helped a bit (12 didn't work) but most of the time I get the same timeouts.

sarpel commented 10 months ago

And it works. @markniu the klipper update did it all.

shlo-mix commented 10 months ago

Now the probing works :)

Z_TILT seems to probe and adjust tilt correctly, but at the end it fails (With delay of 15 and 18) even if Z speed is painfully slow 0.5: image

Thanks

shlo-mix commented 10 months ago

Sometimes when doing TILT the MCU crashes and I get red screen with following massage

"MCU 'EBBCan' shutdown: Timer too close This often indicates the host computer is overloaded. Check for other processes consuming excessive CPU time, high swap..."

markniu commented 10 months ago

Seems that the speed of mcu of EBBcan is a little slow as a CAN module. could you try to increase the sample_time =.03to 0.05 in BDsensor.py ?

Sometimes when doing TILT the MCU crashes and I get red screen with following massage

"MCU 'EBBCan' shutdown: Timer too close This often indicates the host computer is overloaded. Check for other processes consuming excessive CPU time, high swap..."

shlo-mix commented 10 months ago

Hi I changed value to 0.05 and still get this error.

Update:

by increasing the txqueuelen from 128 to 1024 the problem solved with 0.03:

sudo nano /etc/network/interfaces.d/can0

allow-hotplug can0 iface can0 can static bitrate 1000000 up ifconfig $IFACE txqueuelen 1024

shlo-mix commented 10 months ago

Sorry to report a new issue

After failed attempt of Z_TILT if I try second attempt it crashes : image

[Uploading klippy (15).log…]()

but this is rare issue, the top one is the more frustrating (Timer too close ..)

markniu commented 10 months ago

This is very helpful.

by increasing the txqueuelen from 128 to 1024 the problem solved with 0.03:

sudo nano /etc/network/interfaces.d/can0

allow-hotplug can0 iface can0 can static bitrate 1000000 up ifconfig $IFACE txqueuelen 1024

markniu commented 10 months ago

seems this log file didn't upload successfully.

Uploading klippy (15).log…

but this is rare issue, the top one is the more frustrating (Timer too close ..)

shlo-mix commented 10 months ago

sorry here it is: klippy (15).log

shlo-mix commented 10 months ago

This is very helpful.

by increasing the txqueuelen from 128 to 1024 the problem solved with 0.03: sudo nano /etc/network/interfaces.d/can0 allow-hotplug can0 iface can0 can static bitrate 1000000 up ifconfig $IFACE txqueuelen 1024

Unfortunately the problem of "Timer too close .." still exist, maybe it help a bit but didn't solve. this is the more critical problem between the two, as it happened more frequently.

markniu commented 10 months ago

Maybe this will be helpful. https://github.com/Dids/klipper-priority-fix

Unfortunately the problem of "Timer too close .." still exist, maybe it help a bit but didn't solve. this is the more critical problem between the two, as it happened more frequently.

shlo-mix commented 10 months ago

Thank you, I tried but get the same "MCU 'EBBCan' shutdown: Timer too close" Less stable then before as I get other errors:

Is't all comes to the message "processes consuming excessive CPU time" Basically in order to communicate with BDProbe EBB need to do a "SW reading", I2C bit-bang by SW consuming a lot of CPU resources as it stop all other tasks while waiting. Even with HW peripheral I2C is unstable and slow type of communication, so with SW its a nightmare.

one way to make it work while probing, instead of I2C communication, BD need to toggle the GPIO 1 or 0 like a switch, only after that the switch toggles send a "read command" this will increase performance, reduce CPU consumption and will be safer.

The mesh scanning must be with I2C in order to get exact values, but it is less intensive as it's only communicate every 200ms or so. (probing is 30ms).

shlo-mix commented 10 months ago

With 100ms sample time 95% of the times it doesn't crashes

markniu commented 10 months ago

Let the sda_pin as a switch gpio while homing that should be perfect solution without increase the sample time and slow down the home speed. I will try to do that. but that needs to flash the firmware into the BDsenosr with external hardware like stlink or usbtouart module.

one way to make it work while probing, instead of I2C communication, BD need to toggle the GPIO 1 or 0 like a switch, only after that the switch toggles send a "read command" this will increase performance, reduce CPU consumption and will be safer.

shlo-mix commented 10 months ago

That is great news, maybe we can also increase the samples rates - to 5ms or less. I have the ST-Link/v2 (STM8 and STM32) will it be good? what type of MCU? Also I think I have the USB to UART.

markniu commented 10 months ago

The MCU on V1.1(hardware version) is C51 that needs a USB to UART module to flash, the MCU on V1.3 is STM32 that needs st-linkv2. I will update the doc if I finish the firmware within one or two weeks.

That is great news, maybe we can also increase the samples rates - to 5ms or less. I have the ST-Link/v2 (STM8 and STM32) will it be good? what type of MCU? Also I think I have the USB to UART.

shlo-mix commented 9 months ago

Several days ago you added flashing instructions but cannot see a file for 1.1 MCU

markniu commented 9 months ago

right, I am trying to add one more feature that can support working as a strain gauge.

shlo-mix commented 9 months ago

About the implementation, after that do you identify that GPIO is triggers, do you stop Z travel and preform stationary I2C reading to have the exact position? This will give as better accuracy while doing fast probing, both for homing and Z-TILT. I have RT embedded C skills, let me know if I can help.

markniu commented 9 months ago

in switch mode, it will use the gpio to trigger and then switch to I2C mode to read the exact postion. I have updated the hex file for the V1.1

markniu commented 9 months ago

about the strain gauge mode, here is a test https://www.youtube.com/watch?v=lYBy0zhW1d0 I will update the tutorial later for the strain gauge mode.

shlo-mix commented 9 months ago

-I have successfully flash the 1.1 code: FW_Flash_log.txt

I can see the led turning on when the probe is near the metal build plate. but M102 S-1 return empty line, so it's look like I don't have communication.

Do I need to set something under config?

markniu commented 9 months ago

No need to set other config.

you can check if it is flashed success from the serial port. there is a serial port window on the STC-ISP tool, choose the right comport and power on the BDsensor, you can see some message from the bdsensor, like the version(V1.1b), and the raw data of the distance.

and here is the old firmware for the V1.1 V1.1_BD20230608.hexhttps://github.com/markniu/Bed_Distance_sensor/tree/new/hex/V1.1_STC

image

-I have successfully flash the 1.1 code: FW_Flash_log.txt

  • updated the latest BDsensor and Klipper repo.
  • run install_BDsensor.sh and updated firmware on the EBB36.

I can see the led turning on when the probe is near the metal build plate. but M102 S-1 return empty line, so it's look like I don't have communication.

Do I need to set something under config?

shlo-mix commented 9 months ago

When I open both files I get the file is over the validscope, and the exceeding part has been moved to the EEPROM area.

markniu commented 9 months ago

both of the files are hex, it will transfer to bin after it is opened with download tools and you can see they are below 6Kbyte that does not exceed the flash memory.

When I open both files I get the file is over the validscope, and the exceeding part has been moved to the EEPROM area.

shlo-mix commented 9 months ago

With old and new FW its the same

shlo-mix commented 9 months ago

I tried with another new BD sensor that I have, before flashing under UART Helper tab I could see the same "pandapi3d" communication response after reset .

But soon as I flash it with new FW (or old FW) its stop to communicate, now I have 2 sensors that stopped working :), so it might be something with the STC-ISP Program configuration or code.

Code Data len: 01FF9, Checksum: 0B9CD9 EEPROM DataLen: 01400, Checksum: 076CFA

image

image

markniu commented 9 months ago

that's strange, I will download that version STC ISP and check

shlo-mix commented 9 months ago

I found the issue, I think that the file was damaged with direct download.

I disabled the realtime scan of windows and downloaded all the rep as a ZIP (not just the file), now I can see communication

shlo-mix commented 9 months ago

I see that you send the ADC data via UART, what is the sample rate of ADC? It would be great to have at least 200Hz (5ms) ADC sample and GPIO update to have fast Z "GPIO probe" response.

shlo-mix commented 9 months ago

Back to the topic unfortunately it faild to do homing: Internal Error on WebRequest: gcode/script klippy (18).log

markniu commented 9 months ago

the data will be output via uart only if the data is changed.the rate is about 100~200Hz, depends on the distance,the shorter distance the rate will be higher. there is also no data via the I2C only if the host klipper sends data request.

I see that you send the ADC data via UART, what is the sample rate of ADC? It would be great to have at least 200Hz (5ms) ADC sample and GPIO update to have fast Z "GPIO probe" response.