vedderb / bldc

The VESC motor control firmware
2.09k stars 1.31k forks source link

BUG: can-cmd not working reliable #658

Open 1zun4 opened 9 months ago

1zun4 commented 9 months ago

can-cmd does not always work, which I use for changing maximum speed, watts and current. This is quite annoying because it causes multiple ESCs being out of sync. I am on BETA 15 and others with BETA 15 experience similiar issues.

My approach is this: https://github.com/m365fw/vesc_m365_dash/blob/9ac738f598c336d41405ca7faa6cb46920ef0445/g30_dash.lisp#L323-L338

Sometimes it causes a read_error on ~CHANNEL~ error but I cannot tell where it comes from. And how do I catch it? image The function itself always returns true.

Any ideas? The last thing I could try is combining the can-cmd into a single request, next would be to try BETA 17.

vedderb commented 9 months ago

I have been working on this one, which is based on flat values: https://github.com/vedderb/vesc_pkg/tree/main/lib_code_server It will block and give the result back (or a timeout that can be specified), so you know if it worked. You could also make a wrapper that does some retries. It is much easier to use the code server as you don't need to make a string, you can just quote (or quasi-quote) a value.

Can you give it a try?

1zun4 commented 8 months ago

I have been working on this one, which is based on flat values: https://github.com/vedderb/vesc_pkg/tree/main/lib_code_server It will block and give the result back (or a timeout that can be specified), so you know if it worked. You could also make a wrapper that does some retries. It is much easier to use the code server as you don't need to make a string, you can just quote (or quasi-quote) a value.

Can you give it a try?

Thank you. I think I can work with this, but so far I am struggling on how to use it with variables.

(import "pkg@://vesc_packages/lib_code_server/code_server.vescpkg" 'code-server)
(read-eval-program code-server)

(defun configure-speed(speed watts current)
    {
        ;(set-param 'max-speed speed)
        (set-param 'l-watt-max watts)
        ;(set-param 'l-current-max-scale current)
    }
)

(defun set-param (param value)
    {
        (let ((code (str-merge "'(conf-set " "'" (sym2str param) " " (str-from-n value) ")")))
            {
                (print (list "Code " code))
                (print (list "Quote " '(conf-set param value)))
                (print (list "Input Voltage" (rcode-run 40 0.5 '(get-vin)))) ; works
                (print (list "Conf-set 1" (rcode-run 40 0.5 '(conf-set param value)))) ; error cuz it not passes param and value
                (print (list "Conf-set 2" (rcode-run 40 0.5 code))) ; returns the code itself but does not run it?
                (print (list "Conf-set 3" (rcode-run 40 0.5 (read code)))) ; returns the code itself but as symbol
                (print (list "Conf-set 4" (rcode-run 40 0.5 (read-program code)))) ; eerror
            }
        )
    }
)

(configure-speed 20 400 1)

Output:

("Code " "'(conf-set 'l-watt-max 400)")
("Quote " (conf-set param value))
("Input Voltage" 82.012955f32)
("Conf-set 1" eerror)
("Conf-set 2" "'(conf-set 'l-watt-max 400)")
("Conf-set 3" (conf-set (quote l-watt-max) 400))
("Conf-set 4" eerror)
vedderb commented 8 months ago

It should work well with quasi-quotation. Example:

(defun set-param (param value)
        (rcode-run 40 0.5 `(conf-set ,param ,value))
)

Now you can use it like this

(set-param 'max-speed 12.2)
(set-param 'l-watt-max 1000.0)
(set-param 'l-current-max-scale 50.0)

Another small tip: You can now use var for local variables if you prefer that over let. Example:

(defun set-param (param value) {
        (var code (str-merge "'(conf-set " "'" (sym2str param) " " (str-from-n value) ")"))

        (print (list "Code " code))
        ; ...
})
1zun4 commented 8 months ago

No success so far. :/

image

Also what is the best way to check if the function returns a timeout? I would like to fallback to can-cmd when rcode-run returns a timeout.

vedderb commented 8 months ago

My mistake. Can you try this version of set-param?

(defun set-param (param value)
        (rcode-run 40 0.5 `(conf-set (quote ,param) ,value))
)

To check for timeout you can just see if the function returns timeout:

(if (eq (set-param 'l-watt-max 100000) timeout) (print "Timed out"))

That can be used to, for example, make a version of set-param that makes up to 5 retries. Although I'm checking for the result true here as that is what conf-set should return on success.

(defun set-param (param value)
    (looprange i 0 5 {
            (if (eq (rcode-run 26 0.1 `(conf-set (quote ,param) ,value)) t) (break t))
            false
    })
)
vedderb commented 8 months ago

Another note on this one. Turns out there was a bug in unflatten which would cause random strange behavior and crashes. Should be fixed in the latest beta.

1zun4 commented 6 months ago

So, I have updated to BETA 23 yesterday from BETA 15. Overall, it worked as I was able to setup my scooter. image image

After I setup everything I wanted to upload my script g30_script and it parsed successfully and runs: Display shows numbers, speed and so on, as well as battery.

BUT image

My VESC running the script disappeared from CAN and I cannot access it anymore. My guess is that either can-list-devs or can-cmd is causing CAN to stop working.

Please help. Also seems this issue is present for everyone else too, so it is a software bug. image

vedderb commented 6 months ago

Can you give me a short example that reproduces that problem that I can try without the hardware you are using?

1zun4 commented 6 months ago

Can you give me a short example that reproduces that problem that I can try without the hardware you are using?

Not right now as I'm unable to access my front VESC anymore. I have to open up the fully sealed scooter :( I will try to reproduce the issue in a short form when I get access to the USB.

1zun4 commented 6 months ago

LUCKY! I was able to regain access to my front ESC by reflashing the firmware! Magically, even though it was unable to find any other VESCs via CAN, it managed to flash BOTH.

I will now try to reproduce this problem as best I can.

1zun4 commented 6 months ago

So, here you go, this is how I managed to reproduce this issue: https://www.youtube.com/watch?v=drmO4reIDkg

The code I was talking about for the shutdown is the following:

(if (or (= off 1) (= lock 1))
    (if (not (app-is-output-disabled)) ; Disable output when scooter is turned off
        {
            (app-adc-override 0 0)
            (app-adc-override 1 0)
            (app-disable-output -1)
            (set-current 0)
            (loopforeach i (can-list-devs)
                (canset-current i 0)
             )
        }
    )
    (if (app-is-output-disabled) ; Enable output when scooter is turned on
        (app-disable-output 0)
    )
)

Note to the video: After taking the video I tried it by restarting my VESCs over and over. It will not break 100% all the time, it only breaks sometimes.

I have also made previous reproduction attempts by uploading the script to another VESC (which has no display hardware, so is basically an empty testbench): https://www.youtube.com/watch?v=fZGosGPmlXM There I was still able to connect to the secondary VESC, BUT as soon as the script was running, the CAN scan would probably time out.

Also I have noticed that can-list-devs likes to report nil. Should it not report an empty array instead of nil?

vedderb commented 6 months ago

There are still a lot of other factors here, such as the bluetooth connection which can be shaky when polling data at the same time as scanning the CAN-bus. There are also many different types of bluetooth modules out there that have different types of problems at higher message rates. I need an example that is as simple as possible, otherwise I have to spent a lot of time reproducing your setup and then I have to remove one factor at a time until I have a simple enough setup so that I can start looking for the problem. If there are hundreds of things going on at the same time interacting with each other it is almost impossible to find the problem.

1zun4 commented 6 months ago

Sorry, that is as much as I can do. I cannot reproduce this issue in another way. Also the factor bluetooth connection is already out, because as you might can see by the first video the CAN communication is entirely broken.

I will try by removing pieces of code to find the exact cause of this issue, but so far I cannot really tell besides of the shutdown function.

vedderb commented 6 months ago

That can scan behavior is almost certainly related to bluetooth not keeping up with polling data at the same time. What happens in the first video probably is completely unrelated to the can scan behavior in the second video.

Also I have noticed that can-list-devs likes to report nil. Should it not report an empty array instead of nil?

An empty list is the same thing as nil.

In that case there is nothing I can do. It is difficult for me to reproduce your exact setup and even in your setup there is no consistent way of reproducing the problem. To make progress I need something smaller, more reproduceable and no TCP/Bluetooth in the middle.

1zun4 commented 6 months ago

So, after removing all can-cmd and can-setcurrent functions calls the script is working fine without the CAN communication getting broken. I give up. I cannot get it to break 100% of the time, sometimes it breaks, sometimes it just works.

(print (can-list-devs))

(defun configure-speed(speed watts current fw)
    {
        (set-param 'max-speed speed)
        (set-param 'l-watt-max watts)
        (set-param 'l-current-max-scale current)
        (set-param 'foc-fw-current-max fw)
    }
)

(defun set-param (param value)
    {
        (conf-set param value)
        (loopforeach i (can-list-devs)
            {
                (print i)
                (var cmd (str-merge "(conf-set " "'" (sym2str param) " " (str-from-n value) ")"))
                (print cmd)
                (can-cmd i cmd)
            }
        )
    }
)

(def sport-speed (/ 21 3.6))
(def sport-current 1.0)
(def sport-watts 700)
(def sport-fw 0)

(configure-speed sport-speed sport-watts sport-current sport-fw)
(print "Done!")

On my first try: Console output:

Parsing 765 characters
(40)
40
"(conf-set 'max-speed 5.83333)"

(CAN lost connection)

But after restarting my VESC and trying it again, I was not able to get the same results. : (

But it is 100% not a ME issue: image

1zun4 commented 6 months ago

I have now finally took your advise to use code-server library and it works! https://github.com/m365fw/vesc_m365_dash/commit/3508e40db0d0b72f1a3451f4e95f009be378e8e0

So far I have not noticed any desync of configuration, as well as CAN disconnects, but that requires further testing. I took your code example with the 5 retries, because it is very important to have the power limits synced. 👍