jmccrohan / pysolarmanv5

A python module to interact with Solarman Data Logging Sticks
MIT License
132 stars 28 forks source link

KeyError: 7 #71

Closed SamirHafez closed 1 month ago

SamirHafez commented 2 months ago

Hey, I started getting this error when trying to write to a Sofar HYD: [KeyError: 7]

I’m trying to write: service_write_multiple_holding_registers: 4628, values: [3000]

I’ve noticed it happening from about Sept 2nd.

Any guidance on how to debug what’s happening here?

jmccrohan commented 2 months ago

Can you post a debug log please?

SamirHafez commented 2 months ago

Can you post a debug log please?

For sure, how would I go about doing that?

githubDante commented 2 months ago

service_write_multiple_holding_registers is not a PysolarmanV5 function. Which integration are you using ?

SamirHafez commented 2 months ago

service_write_multiple_holding_registers is not a PysolarmanV5 function. Which integration are you using ?

I’m using ha-solarman, but the source code calls pysolarmanv5 directly: https://github.com/davidrapan/ha-solarman/blob/a4c77d6837aeecfb426d51591ef274afddd5bd27/custom_components/solarman/api.py#L89

it just wraps it with a retry mechanism

davidrapan commented 2 months ago

service_* method is at: https://github.com/davidrapan/ha-solarman/blob/a4c77d6837aeecfb426d51591ef274afddd5bd27/custom_components/solarman/api.py#L390 and as he said it's just wrapped to retry failed attempts.

Could you maybe provide whole debug log from the HA? solarman_enable_debug You can do it by clicking on the above button then using the services and after that using the button once more and it will download a log file. 😉

SamirHafez commented 2 months ago

@davidrapan thank you for the instructions, this seems like the relevant portion, other than this is retries with same result and other reading calls that succeed. Do let me know if you’d prefer the whole log file:

[custom_components.solarman.api] [1758281395] service_write_multiple_holding_registers: 4628, values: [3000]
2024-09-05 21:37:08.276 DEBUG (MainThread) [custom_components.solarman.api] [1758281395] SENT: a5 1a 00 10 45 51 00 b3 3e cd 68 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 10 12 14 00 01 02 0b b8 90 07 7c 15
2024-09-05 21:37:08.436 DEBUG (MainThread) [custom_components.solarman.api] [1758281395] RECD: a5 13 00 10 15 51 20 b3 3e cd 68 02 01 50 f4 1f 00 ad 0f 00 00 23 22 ba 66 01 90 07 0d c2 bd 15
2024-09-05 21:37:08.437 WARNING (MainThread) [custom_components.solarman.api] [1758281395] service_write_multiple_holding_registers: 4628, values: [3000] failed, attempts left: 4. [KeyError: 7]
davidrapan commented 2 months ago

@githubDante? Do I read it correctly that it's an exception code 07?

The slave cannot perform the programming request sent by the master.

Edit: @SamirHafez it shows KeyError: 7 cause we have somewhat incomplete exception handling but in short the device returns an exception with the description above.

SamirHafez commented 2 months ago

@githubDante? Do I read it correctly that it's an exception code 07?

The slave cannot perform the programming request sent by the master.

Edit: @SamirHafez it shows KeyError: 7 cause we have somewhat incomplete exception handling but in short the device returns an exception with the description above.

So this means the inverter is “acting up”? Strange that this just start happening out of nowhere 🤔

I’ll need to dig deeper in that case. It’s curious that all the other writing registers I use continue to work, just not this specific one.

githubDante commented 2 months ago

@githubDante? Do I read it correctly that it's an exception code 07?

The slave cannot perform the programming request sent by the master.

Edit: @SamirHafez it shows KeyError: 7 cause we have somewhat incomplete exception handling but in short the device returns an exception with the description above.

@davidrapan You are right, it's an exception. But to be honest this is the first valid/complete exception response from a datalogger that I've seen :smiley: The funny thing is that the code 0x07 is not supported by umodbus, therefore a KeyError is raised.

>>> from umodbus.client.serial.rtu import parse_response_adu
>>> from umodbus.client.serial.redundancy_check import get_crc
>>> req = bytes.fromhex('01 10 12 14 00 01 02 0b b8 90 07')
>>> resp = bytes.fromhex('01 90 07 0d c2')
>>> try:
...     parse_response_adu(resp, req)
... except Exception as e:
...     print(e.__class__.__name__, e)
... 
KeyError 7
>>> 
>>> resp_supported = bytearray(resp)
>>> resp_supported[2] = 6 
>>> resp_supported = resp_supported[:-2] + get_crc(resp_supported[:-2])
>>> try:
...     parse_response_adu(resp_supported, req)
... except Exception as e:
...     print(e.__class__.__name__, e)
... 
ServerDeviceBusyError  The server is engaged in a long-duration program command. 
SamirHafez commented 2 months ago

@githubDante is there any hint in this investigation that could point me in the direction of finding out why the inverter is refusing to accept this command, all of a sudden?

davidrapan commented 2 months ago

Damn! 😆 At this point I wouldn't be surprised if solarman spontaneously spawns new universe!

The funny thing is that the code 0x07 is not supported by umodbus, therefore a KeyError is raised.

So it's even one more layer above.. well, I see that they don't bother with the error descriptions either. 😄

@githubDante is there any hint in this investigation that could point me in the direction of finding out why the inverter is refusing to accept this command, all of a sudden?

The thing is we can't trust much of anything what sticks are saying as they don't really follow the rules, lol...

What register it is anyway?

SamirHafez commented 2 months ago

@davidrapan its:

Register: 0x1214
Function: Passive Mode/Charging
Var Type: Uint16
Range: 0-3000
Unit :Watt

from the Sofar HYD ES modbus-RTU protocol.

I’m going to try and restart the whole inverter today to check it didn’t enter a funky state.

SamirHafez commented 2 months ago

@davidrapan So restarting the inverter did nothing... But restarting the attached batteries resolved this.

I guess they must have gotten into a weird state and the inverter refused to carry out the "charge" instruction.

This is resolved now, but if I had to leave a suggestion would be to improve error handling to at least be able to tell the difference between "your command was sent/received OK, but it was an invalid one" vs "your command broke something in the codebase"

davidrapan commented 2 months ago

As @githubDante already said this particular exception is not supported in the umodbus lib which we are using that's why it gave this non descriptive message. The other thing is that it happened for the first time. 😆 At least what we know of... so I guess we can do something about it...

The majority of the implementation is just a guessing work though and working with the solarman protocol is really painful. The act of writing registers can for example fail 3 times in a row and then will succeed in the fourth try with no apparent reason for any of the two outcomes (That's for example my experience for the most part).

I had for example this one today: IllegalDataValueError: The value contained in the request data field is not an allowable value for the server. 6 times in a row and then it went through for the seventh try... 😆

Even with all the mitigation I did in my integration (retry attempts, don't read and write at the same time, etc) it's way far from perfect...

SamirHafez commented 2 months ago

As @githubDante already said this particular exception is not supported in the umodbus lib which we are using that's why it gave this non descriptive message. The other thing is that it happened for the first time. 😆 At least what we know of... so I guess we can do something about it...

The majority of the implementation is just a guessing work though and working with the solarman protocol is really painful. The act of writing registers can for example fail 3 times in a row and then will succeed in the fourth try with no apparent reason for any of the two outcomes (That's for example my experience for the most part).

I had for example this one today: IllegalDataValueError: The value contained in the request data field is not an allowable value for the server. 6 times in a row and then it went through for the seventh try... 😆

Even with all the mitigation I did in my integration (retry attempts, don't read and write at the same time, etc) it's way far from perfect...

I’ve had that exact one before and it turned out because I was writing to a register that needed another register to be activated beforehand, for example:

  1. tell the inverter to switch to passive mode: response is okay
  2. Tell the inverter to start charging: fails with the error you described 3x then succeeds.

What I found in this situation is that it’s not enough for step 1 to succeed before you do step 2, I had to keep running step 1 in a loop by writing to it, and then reading the current value. Only when the current reading value changed, would I then be able to execute 2 without errors.

davidrapan commented 2 months ago

Yeah I also saw that some registers do depend on each other but in the case I was describing it was really just enough to try again.

githubDante commented 1 month ago

Unless the ACK error is added to umodbus (which seems highly unlikely) I think there is nothing that can be done here.