Open marcel-kanter opened 4 years ago
Another one. The dlc values were missing.
test/data/example_data.py
TEST_MESSAGES_REMOTE_FRAMES = sort_messages( [ Message( arbitration_id=0xDADADA, is_extended_id=True, is_remote_frame=True, dlc=1, timestamp=TEST_TIME + 0.165, ), Message( arbitration_id=0x123, is_extended_id=False, is_remote_frame=True, dlc=1, timestamp=TEST_TIME + 0.365, ), Message( arbitration_id=0x768, is_extended_id=False, is_remote_frame=True, dlc=1, timestamp=TEST_TIME + 3.165, ), Message( arbitration_id=0xABCDEF, is_extended_id=True, is_remote_frame=True, dlc=1, timestamp=TEST_TIME + 7858.67, ), ] )
Am I missing something? Is this intended?
The BLF files are borrowed from the C++ library’s unit tests since I don’t have a Vector tool to generate new ones with. The DLC change makes sense. Could you provide a pull request?
I'm working on a fix already. It seems there are more bugs in the reader/writers, they hide behind the defacto non-existent check of the properties of Message, a behavior I will remove, since it makes no sense to create messages that are corrupt and (physically on the can bus) not possible.
Leaving is as it leads in the end to a situation in which no-one can rely on the properties of a Message instance. This means you have to implement the checks in the interfaces. That will definitvely produce a lot of duplicate error-prone code.
I believe the checks were disabled by default due to performance reasons. The interfaces and log formats should be implemented so that they do not produce incorrect data. If there for some reason are incorrect data then the application should continue to function and not raise an exception.
We could make the check method public so that the application can check any incoming message if desired.
Hmm. Well not so good idea, for reasons i stated before. One other thought: Is it possible, that the data files for the blf are negative examples? Found another incorrect Message. is_fd and is_remote_frame are not allowed together. Remote frames have been dropped in can FD due to the arbitration problems when simultaneous RTR are transmitted.
Ok. ASC Reader/Writer work now. I will produce some test data sets and add them to the test code.
Bugs in the log formats code should be caught by unit tests, not during runtime. That’s why I think we can call the check method explicitly in the unit tests to make sure they contain valid data. Forcing runtime checks and the overhead it implies when the data usually can be assumed to be valid is probably not what people want.
The BLF files are not negative tests. The C++ library doesn’t assume anything about what a valid message is so the data is just some random values. If you could generate new files with valid data that would be very much appreciated.
The file test_CAnMesage.blf is kind of corrupt. It matches the format description vector provides, but their tools wont import them. Basically because the DLC of 51 is not valid. At least manipulating the file with an hex editor to valid values let canalyzer import them. I'll need the full licenced version from work to generate more valid sample files. The demo version encrypts the blf files.
I believe the checks were disabled by default due to performance reasons. The interfaces and log formats should be implemented so that they do not produce incorrect data. If there for some reason are incorrect data then the application should continue to function and not raise an exception.
We could make the check method public so that the application can check any incoming message if desired.
The point is that there are more than 6485 lines of code in this package and check is only used in the test of the Message class itself (which is very incomplete btw.). No-one used this yet elsewhere, so making the method optional will not propably improve the quality. And the performance penalty will be there anyway, since no interface and no io can trust the objects it gets and must check them anyway.
So before I continue to fix this, is there any chance that the always enabled check is going into this repo?
I would like someone else, maybe @hardbyte or @felixdivo, to chime in on that. In my opinion I think checking of our internal sources should be optional, i.e. there should be a check() method the application can call to verify it is correct. Otherwise we just pass the messages as is however the source generated it.
Hi folks,
the last two major discussions on the message classes happened in #413 (adds Message.check()
) and #497 (cleanups to that method). We did not directly discuss if check=True
should be the default in the constructor, but I think we did that somewhere.
I'm favoring the automatic checking by default in the constructor (make check=True
the default), as it allows bus implementations and users to trust that messages are well-formed. If anyone knows what he's doing and wants to tweak performance, he could pass check=False
to the constructor do disable the check. We could optionally do that in buses, but should probably not trust users with creating valid messages.
If you could generate new files with valid data that would be very much appreciated.
Yes! Generally, I think that the internal tests should contain valid messages, as some might use it for reference and should find proper code.
[I was in vacation for a few days, thus the late answer.]
Hi, I think this was also discussed as part of https://github.com/hardbyte/python-can/pull/537
At the time the concern that this might cause performance problems on Raspberry Pi was raised.
As there are multiple versions of the Pi I suggest we decide to test on a specific version and see what the actual impact is. Generally I am in favor of switching on checking by default.
To summarize: Check on init of Message optional (default on) Check on every interface/io write/read. Because leaving the check optional on creation will not change anything. Every interface has to expect invalid messages.
Well then it makes more sense to make the properties "private" and check always (init and access) This way each message is always valid.
Check on every interface/io write/read. Because leaving the check optional on creation will not change anything. Every interface has to expect invalid messages.
I disagree and would argue that assuming validity is a good way to keep the amount of checks (because of performance and conciseness of code) somewhat limited. And if someone disables checks, he/she has to ensure the invariants/constraints of messages are are always fulfilled. The responsibility changes from the library to the user, but it could still be assumed. Else we do not need to make the check optional in the first place.
Hmm. I dont get it. Lets take the example blf files and the reader. They contain invalid messages. When do you want to make sure that these Messages dont come out of the reader? Do you assume that all files the reader gets contain valid messages? Do you want the reader to yield the invalid messages? Or should each reader implement the checks by its own? If you let the reader yield the invalid messages, then what happens with them? They get its way to the garbage collector when no-one uses them (the unusual best case) or they will make their way into another file (via an writer) or to an interface. Should the writer simply dump them into the output file(s)? Then you will have two files containing invalid messages. Or should the writer check the messages?
I'd do it this way:
can.Message
should be valid. check=True
is the default for assuring that invariant.check=False
. This might include buses and readers, when it is more convenient/efficient to do checks there. But the choice to implement custom checking and/or disabling the default checks is deliberate, and the caller needs to make sure the messages are still valid. This might for example be used when a message known to be valid is copied.In the case of the reader: It should not be allowed to emit invalid messages. At some point the BlfReader
would create messages. At that point, it would call the constructor Message(data=..., ...)
and either
check=True
(or pass nothing), to enable the check, orcheck=False
and do the checking manually, e.g. because some checks have already been done while parsing and only very few checks are left.How would a user disable the internal checks if for example a log includes one or two messages that python-can considers invalid but you still want to continue parsing the rest of messages in the log? With this change the parser or Notifier will just die and stop processing more messages.
That’s why I think an explicit call to .check() is better because the user can choose what to do if an exception is raised.
We could make the reader ignore that or drop the messages, as an opt-out of the checks.
And if they want to display a warning or something instead? All I’m saying is don’t take the power away from the developers or automatically assume they don’t know how to create valid messages. It’s also controversial to say that all APIs like SocketCAN or Vector are most likely super buggy and will produce faulty messages or that Canalyzer will produce corrupt log files. We don’t need to hide or point out problems that we are not responsible for.
I did a quick test with a standard BLF file and the parser got ~25% slower. We've already had valid complaints that our log parsers are very slow as they are.
And if they want to display a warning or something instead?
Catch the exception raised during check by the reader?
from the developers or automatically assume they don’t know how to create valid messages.
5 contributors didn't see the obvious, 10 contributors, if you count the ones from the blf io
The blf log files for testing the parser has always been known to be invalid but that has been perfectly fine because we only wanted to test that we get the expected data, not that the data in the files are actually valid.
Of course it would be better to have something generated from a third party tool but it was the best I could get at the moment.
Don’t take the test files too seriously, they are not representable for the type of data that comes from CAN APIs and log files.
When you catch the exception while iterating a reader it will already be too late. The generator function will have stopped executing so you can’t continue reading any more messages.
When you catch the exception while iterating a reader it will already be too late.
That's right. We'd have to add some more explicit handling to the reader interfaces. Sounds like a bit of work ...
This endeavor seems to be more work like I initially thought. I'd generally rather like to see stuff crash and make developers fix their broken code and tools instead of hiding such stuff. But in this case it seems like we'd need to touch too much and break too much. We'd slightly change the message class, need to change how the IO module handles invalid files (which in itself would already be cool but quite some work) and we'd probably need to do changes in the buses and notifiers/listeners.
And the chance that a real-world log file contains invalid messages or a CAN interface emits messages that are impossible to appear on a bus is highly theoretical. Are we really doing the developers a favor when adding a significant amount of overhead, potentially crashing their applications and complicating the API based on something that never happens?
Messages that are created outside of python-can are probably not completely random either and I’m pretty sure the developer will notice one way or another that the message can’t be sent.
Have a look at
logformats_test
Even with an FD frame and treading dlc as length, this an invalid message. But code seems not to expect an invalid message.