mitch7391 / homebridge-cmd4-AdvantageAir

Catered shell script to integrate air conditioner control units by Advantage Air into HomeKit using the plug-in homebridge-cmd4.
MIT License
38 stars 5 forks source link

[Pull Request] Simultaneous Multiple Zones Closing Bug Fix #48

Closed uswong closed 2 years ago

uswong commented 2 years ago

name: Pull Request about: Resolve an issue to homebridge-cmd4-AdvantageAir. title: "Fix a bug on simultaneous multiple zones closing" labels: pull-request assignees: mitch7391


Fix a bug on simultaneous multiple zones closing and one other minor bug

Is your pull request related to a problem or a new feature? Please describe:

Describe the solution you'd have implemented: 1) The solution is to keep the number of zoneOpen in a temporary file when the first Get/Set command is run and the subsequent Get/Set command can access the file to see the number of zones open from the temporary file if it fails to retrieve the info the system.

2) extract the time of last changed rather than the time of creation of the file

Do your changes pass local testing:

Additional context:

ztalbot2000 commented 2 years ago

Hi,

I still don't understand the problem and how it can happen. If Cmd4 executes a "Set" command, there is no possible "Get" by Cmd4, before or after; because of WoRm and "WoRm2"

The next HomeKit "Get" in WoRm will be directly from the device. Your Cached file though returns possibly old data of the last 2 minutes. The next HomeKit "Get" in WoRm2 will return whatever state Cmd4 knows of it, Which could have been previously "Set". A polled "Get" will always return what's your cached file has in the last two minutes.

It seems that following all "Set" commands, you should update the systemData so the next "Set" or "Get" has correct data to work with. Or even better, All "Set" commands need to do a getSystemData before and after. No 4 seconds required.

I will see what your latest solutions entail.

Ttyl, John

On Sun, Feb 20, 2022 at 10:43 PM uswong @.***> wrote:


name: Pull Request about: Resolve an issue to homebridge-cmd4-AdvantageAir. title: "Fix a bug on simultaneous multiple zones closing" labels: pull-request assignees: mitch7391

Fix a bug on simultaneous multiple zones closing and one other minor bug

Is your pull request related to a problem or a new feature? Please describe:

1.

Bug related to a problem: The problem was related a close succession of Get/Set commands when closing multiple zones at the same time via automations, scenes or Siri command. A Get command proceed a Set command because the number of zones open (zoneOpen) needs to be known before deciding whether it is safe to close the zone or not. Based on the API, when the aircon is set, the json file will only be available for retrieval in 4s. As such, if we have a Get/Set command and followed closely by another Get/Set command (probably < 1s), the json file retrieved will be empty, therefore the subsequent Set command will assume zoneOpen to be its default value of 0 which in turn trigger the opening of the constant zone and set it to 100%. 2.

one other minor bug: The age of file should be measured from the time of its last changed rather than the time of its creation. 3.

general streamlining of the script

Describe the solution you'd have implemented:

1.

The solution is to keep the number of zoneOpen in a temporary file when the first Get/Set command is run and the subsequent Get/Set command can access the file to see the number of zones open from the temporary file if it fails to retrieve the info the system. 2.

extract the time of last changed rather than the time of creation of the file

Do your changes pass local testing:

  • Yes Tested on my E-zone system, passed shellcheck and unit tests
  • No

Additional context:

You can view, comment on, or merge this pull request online at:

https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48 Commit Summary

File Changes

(1 file https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48/files)

Patch Links:

- https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48.patch

https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48.diff

— Reply to this email directly, view it on GitHub https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSBCX6BO4HGE5LNCJSP67LU4GYHVANCNFSM5O5NBKAA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mitch7391 commented 2 years ago

@uswong I will let you explain this one as you found the cause. It was definitely an issue though @ztalbot2000, it was found by another user who has automations set up for his zones and I tested it myself and it happened without fail when using automations or scenes to close more than one zone at a time.

uswong commented 2 years ago

Folks, I’m very busy today. Will get back to you guys tomorrow. Cheers, UngSing

On Tue, 22 Feb 2022 at 7:43 pm Mitch Williams @.***> wrote:

@uswong https://github.com/uswong I will let you explain this one as you found the cause. It was definitely an issue though @ztalbot2000 https://github.com/ztalbot2000, it was found by another user who has automations set up for his zones and I tested it myself and it happened without fail when using automations or scenes to close more than one zone at a time.

— Reply to this email directly, view it on GitHub https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48#issuecomment-1047711682, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXAO6PI5D7VFNM3XUCYIKEDU4NZFRANCNFSM5O5NBKAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

uswong commented 2 years ago

Hey John,

Sorry for the super late reply. I was out all day yesterday.

If Cmd4 executes a "Set" command, there is no possible "Get" by Cmd4, before or after; because of WoRm and "WoRm2"

In this case, the "Get" command is not by Cmd4, it is a queryAirCon() call within AdvAir.sh before the actual Set command from Cmd4 to close any zone. This queryAircon() call is required to find out the number of zone open before deciding whether to close the zone or not.

This logic was built in to ensure that at least one zone is open at all time even if users through Siri (without the visual display) accidentally close the only open zone.

The next HomeKit "Get" in WoRm2 will return whatever state Cmd4 knows of it, Which could have been previously "Set". A polled "Get" will always return what's your cached file has in the last two minutes.

Whenever a valid Set command is received and executed, the cache file will be deleted immediately because the cache file no longer reflects the state of the Aircon. The next queryAirCon() call either from Cmd4 Get command or AdvAir.sh makes an internal queryAirCon() call will re-created the cache file and the 90s cache cycle will start again.

The cache file, if present, will always reflect the latest state of the aircon, unless the aircon state was changed using other means other than Homekit.

Now, the issue is when the Aircon system receives a valid Set command, it will take up to 4 seconds to have the change reflected in the json. Within the 4 seconds, and a getSystemData will return {}, an empty json body. I quoted below the sentence from AdvanatageAir API on this regards.

Note: Receive a valid change request, we transmit empty "{}" until the state change is confirmed by our hardware and reflected in the json, this can take up to 4 seconds.

Hope this helps to clarify.

ztalbot2000 commented 2 years ago

i have not tried this, but here is my thought, in your config.json for your GargeDoorOpener put: "props": { "currentDoorState": { "maxValue": 0, "minValue": 4, "minStep": -1 }, "targetDoorState": { "maxValue": 0, "minValue": 1, "minStep": -1 } }, "state_cmd": whatever

i've never redefined two properties before. i've only did one and i do not know how homekit will behave. You will need to restart homebridge. it is a shot. These will reverse the order of open to close.

ttyl, john

On Tue, Feb 22, 2022 at 12:01 AM John Talbot @.***> wrote:

Hi,

I still don't understand the problem and how it can happen. If Cmd4 executes a "Set" command, there is no possible "Get" by Cmd4, before or after; because of WoRm and "WoRm2"

The next HomeKit "Get" in WoRm will be directly from the device. Your Cached file though returns possibly old data of the last 2 minutes. The next HomeKit "Get" in WoRm2 will return whatever state Cmd4 knows of it, Which could have been previously "Set". A polled "Get" will always return what's your cached file has in the last two minutes.

It seems that following all "Set" commands, you should update the systemData so the next "Set" or "Get" has correct data to work with. Or even better, All "Set" commands need to do a getSystemData before and after. No 4 seconds required.

I will see what your latest solutions entail.

Ttyl, John

On Sun, Feb 20, 2022 at 10:43 PM uswong @.***> wrote:


name: Pull Request about: Resolve an issue to homebridge-cmd4-AdvantageAir. title: "Fix a bug on simultaneous multiple zones closing" labels: pull-request assignees: mitch7391

Fix a bug on simultaneous multiple zones closing and one other minor bug

Is your pull request related to a problem or a new feature? Please describe:

1.

Bug related to a problem: The problem was related a close succession of Get/Set commands when closing multiple zones at the same time via automations, scenes or Siri command. A Get command proceed a Set command because the number of zones open (zoneOpen) needs to be known before deciding whether it is safe to close the zone or not. Based on the API, when the aircon is set, the json file will only be available for retrieval in 4s. As such, if we have a Get/Set command and followed closely by another Get/Set command (probably < 1s), the json file retrieved will be empty, therefore the subsequent Set command will assume zoneOpen to be its default value of 0 which in turn trigger the opening of the constant zone and set it to 100%. 2.

one other minor bug: The age of file should be measured from the time of its last changed rather than the time of its creation. 3.

general streamlining of the script

Describe the solution you'd have implemented:

1.

The solution is to keep the number of zoneOpen in a temporary file when the first Get/Set command is run and the subsequent Get/Set command can access the file to see the number of zones open from the temporary file if it fails to retrieve the info the system. 2.

extract the time of last changed rather than the time of creation of the file

Do your changes pass local testing:

  • Yes Tested on my E-zone system, passed shellcheck and unit tests
  • No

Additional context:

You can view, comment on, or merge this pull request online at:

https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48 Commit Summary

File Changes

(1 file https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48/files )

Patch Links:

- https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48.patch

https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48.diff

— Reply to this email directly, view it on GitHub https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSBCX6BO4HGE5LNCJSP67LU4GYHVANCNFSM5O5NBKAA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

uswong commented 2 years ago

To that end I created a branch b_zBeta in my repo that solves this. If we had more unit tests against the other features, I could test against it.

What is nice about getting the systen]mData on every "SeT, which really does not happen often, is that the data is now correct for "Gets" and no need to create a separate ID file.

Hey John,

I really like the new queryAirConUsingIteration() function and the updated unit tests to reflect the function change and I have tested those in my script and it works brilliantly with just some very minor edits on SetOn.bats and SetTargetHeadingCoolingTemperature.bats.

However, I do not wish to have any additional Get command unless it is absolutely necessary.

A Get command before a Set command is not required except in only one occasion when we are setting a zone or zones to close. If we are setting multiple zones to close within a very short time, the subsequent Get commands after the first will fail anyway and I already have a fix for it.

The aircon system will take up to 4 seconds to re-populate the json body with the new settings. My testing shows that if I make a queryAirCon call immediately after a Set command, the return code is still a 0 but the json body is empty with only {} in it.

This up to 4 seconds delay is documented in AdvantageAir API:

https://advantageair.proboards.com/thread/2/sticky-aircon-api-document

QUOTE Note: Receive a valid change request, we transmit empty "{}" until the state change is confirmed by our hardware and reflected in the json, this can take up to 4 seconds. UNQUOTE

Leave that with me, I will extract all the goodies out of your modified script. I will test each change with the real systems here. The issue with unit tests is that it does not have the realistic responses from the real system. I have the unit tests and the real systems to test with.

Thanks for the many good ideas you brought to the table.

ztalbot2000 commented 2 years ago

Hi,

Got It!

Now, the issue is when the Aircon system receives a valid Set command, it will take up to 4 seconds to have the change reflected in the json. Within the 4 seconds, and a getSystemData will return {}, an empty json body. I quoted below the sentence from AdvanatageAir API on this regards.

This paragraph is what made me understand the delema. The "Set" and "Get" is one Write Once transaction to Cmd4, but the AirCon seems to be be unstable after the "Set" for 4 seconds.

The b_zBeta branch I created does a "Get" before any "Set" command. This might help a little bit, but I think I've stumbled upon a system problem, related to what your delema is.

Previously queryAirConAndParseWithJq detected the bad data, because the parse failed. It would then retry to get the whole data again.

By splitting queryAirConAndParseWithJq into its two parts, we lost that data validity check. Waiting 4 seconds is a sign of a bad system design. In my branch b_zBeta I noticed this when the unit tests were not retrying.

AdvAir.sh has a bad mix right now of doing things by sometimes calling queryAirConAndParseWithJq and some times doing the split of the routines. The Unit tests do not check any of the new functionality and so when my b_zBeta branch tests the old functionality the way your new stuff is written, I see the failures. This implies that if there were unit tests for the new capabilities, it would fail in a similar fashion.

I will update my b_zBeta branch to fix this system issue. This would resolve the 4 second wait as well, being that the "Get" would retry, as it should, for valid data.

I was also working on a unit test server that would allow AdvAir.sh to actually do the same curl command in production, except to a local host. This would significantly improve our test coverage. It is actually easier than I thought.

I would appreciate if you would write test cases for your new stuff. Follow GetBrightness.bats as an example since it does not compare against the old scripts as that functionality did not exit then. All your really doing in the unit tests is: AdvAir.sh Get My_light Brightness 192.168.2.1 z01 and then checking the output to see that it is what you expect.

Lately while I'm in bed I have been thinking more of our design than actually caring to write the code. Sorry about that. Really not myself lately.

ttyl, John

On Tue, Feb 22, 2022 at 9:30 PM uswong @.***> wrote:

Hey John,

Sorry for the super late reply. I was out all day yesterday.

If Cmd4 executes a "Set" command, there is no possible "Get" by Cmd4, before or after; because of WoRm and "WoRm2"

In this case, the "Get" command is not by Cmd4, it is a queryAirCon() call within AdvAir.sh before the actual Set command from Cmd4 to close any zone. This queryAircon() call is required to find out the number of zone open before deciding whether to close the zone or not.

This logic was built in to ensure that at least one zone is open at all time even if users through Siri (without the visual display) accidentally close the only open zone.

The next HomeKit "Get" in WoRm2 will return whatever state Cmd4 knows of it, Which could have been previously "Set". A polled "Get" will always return what's your cached file has in the last two minutes.

Whenever a valid Set command is received and executed, the cache file will be deleted immediately because the cache file no longer reflects the state of the Aircon. The next queryAirCon() call either from Cmd4 Get command or AdvAir.sh makes an internal queryAirCon() call will re-created the cache file and the 90s cache cycle will start again.

The cache file, if present, will always reflect the latest state of the aircon, unless the aircon state was changed using other means other than Homekit.

Now, the issue is when the Aircon system receives a valid Set command, it will take up to 4 seconds to have the change reflected in the json. Within the 4 seconds, and a getSystemData will return {}, an empty json body. I quoted below the sentence from AdvanatageAir API on this regards.

Note: Receive a valid change request, we transmit empty "{}" until the state change is confirmed by our hardware and reflected in the json, this can take up to 4 seconds.

Hope this helps to clarify.

— Reply to this email directly, view it on GitHub https://github.com/mitch7391/homebridge-cmd4-AdvantageAir/pull/48#issuecomment-1048396970, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSBCXZYCEC6A7AHN5Y2RMDU4RBDRANCNFSM5O5NBKAA . You are receiving this because you were mentioned.Message ID: @.***>

uswong commented 2 years ago

Hey John,

the "Get" would retry, as it should, for valid data.

In fact, I do have that valid file check within the queryAirCon() function but it is for the cached file and will retry on second Get command.

Your comments has put forward the idea of checking whether it is a valid file immediately after the Get command even if the return code is 0. A valid file check should then be done within the function queryAirConWithIterations(), if the file is not valid, wait for 1.2 s and iterate. If you like, you can leave the coding with me.

uswong commented 2 years ago

Hey John,

I have put in extra 1.5 lines of code within the function queryAirConWithIterations() to check for the fSize of $MY_AIRDATA_FILE and only accept file with size > 2000 byte. File with size > 2000 bytes cannot be empty hence most likely a valid file:

  function queryAirConWithIterations()
  {
     local url="$1"
     local exitOnFail="0"
     # Try 5 times, the last returning the error found.
     for i in 0 1 2 3 4
     do
        if [ "$selfTest" = "TEST_ON" ]; then
           echo "Try $i"
        fi
        if [ "$i" = "4" ]; then
           exitOnFail="1"
        fi
        # Updates global variable myAirData
        queryAirCon "$url" "$exitOnFail" "$i"
        getFileStatDtFsize "$MY_AIRDATA_FILE"
        if [ "$rc" = "0" ] && [ "$fSize" -gt 2000 ]; then
           break
        else
           sleep 1.5
        fi
     done
  }