Closed bdraco closed 1 year ago
But the signature is different for every traversal of the loop:
for signature, parser in self.telegram_specification['objects'].items():
pattern = re.compile(signature, re.DOTALL)
matches = pattern.findall(telegram_data)
As this is done again for every new telegram offered to the parser, you might still be right that there might be some unnecessary overhead here.
I noticed on bulk processing that the parser is very slow. Not sure how much improvement this will bring. Could be an interesting experiment.
A quick fix would be an lru_cache
around the re.compile
_cached_re_compile = lru_cache(maxsize=512)(re.compile)
. -- max size may be too high
for signature, parser in self.telegram_specification['objects'].items():
pattern = _cached_re_compile(signature, re.DOTALL)
matches = pattern.findall(telegram_data)
A better fix would be to store the re.compile
objects for each signature
Another option could be to do it in the init
@bdraco i tested it and it seems to improve the performance by about 5-6%. Do you have a preference for the chosen solution? Im impartial and also fine wit the lru approach.
https://github.com/ndokter/dsmr_parser/pull/131/files
I suspect there is a big(ger) gain to be made in TelegramBuffer.get_all
. It's a method that is called for every x bytes received, after which it performs a regex to see if a full telegram is received. Maybe it can be optimized to more smartly look at the end of a telegram which often is something like !56DD\r\n
Building them ahead of time and storing them is cleaner than using an LRU 👍
I suspect there is a big(ger) gain to be made in
TelegramBuffer.get_all
. It's a method that is called for every x bytes received, after which it performs a regex to see if a full telegram is received. Maybe it can be optimized to more smartly look at the end of a telegram which often is something like!56DD\r\n
import re
# - Match all characters after start of telegram except for the start
# itself again '^\/]+', which eliminates incomplete preceding telegrams.
# - Do non greedy match using '?' so start is matched up to the first
# checksum that's found.
# - The checksum is optional '{0,4}' because not all telegram versions
# support it.
_FIND_TELEGRAMS_REGEX = re.compile(r"\/[^\/]+?\![A-F0-9]{0,4}\0?\r\n", re.DOTALL)
class TelegramBuffer(object):
"""
Used as a buffer for a stream of telegram data. Constructs full telegram
strings from the buffered data and returns it.
"""
def __init__(self):
self._buffer = ""
def get_all(self):
"""
Remove complete telegrams from buffer and yield them.
:rtype generator:
"""
for telegram in _FIND_TELEGRAMS_REGEX.findall(self._buffer):
self._remove(telegram)
yield telegram
def append(self, data):
"""
Add telegram data to buffer.
:param str data: chars, lines or full telegram strings of telegram data
"""
self._buffer += data
def _remove(self, telegram):
"""
Remove telegram from buffer and incomplete data preceding it. This
is easier than validating the data before adding it to the buffer.
:param str telegram:
:return:
"""
# Remove data leading up to the telegram and the telegram itself.
index = self._buffer.index(telegram) + len(telegram)
self._buffer = self._buffer[index:]
Maybe something like this.
I don't actually use this lib so I can't test it. I was only submitting this on behalf of a user that I was helping with a performance issue with their system.
Yup, that user is me ;) Thank you all for looking into this!
Thank you, i have put these performance improvements in version 1.2.2
Great work! Would it be possible to merge the update into Home Assistant?
I don't know how to test the changes in HA. It seems that the version listed here is quite old as well (v0.33): https://github.com/home-assistant/core/blob/dev/homeassistant/components/dsmr/manifest.json
https://github.com/home-assistant/core/pull/84097 looks like it's waiting for a reviewer that can test this
I've made a HACS integration for it now, as the PR keeps hanging in HA: https://github.com/dupondje/homeassistant-dsmr
Cool, thanks! I'll give it a try today/tomorrow. I'm curous to see the impact of the efficiency-improvement on the cpu-utlization.
By the way, since https://github.com/home-assistant/core/pull/84097 is from dec 2022, i guess the performance improvement of version 1.2.2 is not in there. Would it be an idea to add the latest improvement in it?
Cool, thanks! I'll give it a try today/tomorrow. I'm curous to see the impact of the efficiency-improvement on the cpu-utlization.
I've tested with the new version and compared the performance with the vanila dsmr of home assistant. From performance perspective, i don't see a clear pattern. It might be that vanilla home assistant dsmr performs slightly better, but the difference is too small to tell from the graphs. I've done two runs and rescaled the y-axis for both runs.
https://github.com/ndokter/dsmr_parser/blob/de167c89b6b4c01f166cbf745ae431665b6443e7/dsmr_parser/parsers.py#L83
It would be nice if this cached the compile of the regex and didn't recompile it every loop