Open n-jay opened 11 months ago
Fingerprint values seem to be changing with each subsequent Github Actions run.
1st. Thanks a lot for the PR. We are busy with other topics right now, but we will review this for the next INET release. Until then:
As other fingerprints are stable, this indicates that the code execution is not repeatable. In my experience this might be caused by:
It is very easy to create a hashed map or set and forget about the fact that its iteration order may depend on memory layout. We had some similar issues in early versions of INET and OMNeT++.
I just had a quick look into the code (not a proper review, but some comments).
packetType == 5
does not mean anything to the code reader. It's an implementation detail that the value is currently 5.Noted @rhornig, thank you! I'm still very much a novice when it comes to C++ so this was extremely helpful. I'll look into the points you made and do the necessary changes.
Can those changes be committed separately beyond this point (with detailed commit messages), or should I squash them to the earlier commits?
just go on with new commits. Once we decide to merge, we will squash it.
I see an include
in LeachBS.cc. Check that file carefully whether the does not depend of that set's order.
@rhornig I looked into the use of unordered collections in the LeachBS.cc
file and it turned out to be an unused #includes
from my original code that had no implementation. I've since removed it in https://github.com/inet-framework/inet/pull/929/commits/298eff7e475b8d35a7ab857ff7b55f957e9b165d.
I've gone over the rest of my .cc
and .h
files and verified that there aren't any more unordered collections, only vectors. However the problem still seems to persist.
Regarding the failing fingerprint tests: If there is no obviously visible parts where unordered collections are used then I would check also for memory issues. Running valgrind memcheck may provide some tips. I would also check whether release and debug versions produce the same fingerprint. If they do, I would try to somehow reproduce the different fingerprints reliably locally (having a different fingerprint only during the github ci test makes catching this bug very hard).
Once you can reproduce the fingerprints, you can write out an event log file for both runs and compare those and see where the first difference happens. That would at least give some hints what is happening.
These are the ugliest type of errors one can find in a simulation. That's the main reason we have implemented fingerprinting, to at least catch them early.
We will need some documentation about the routing protocol implementation in the Leach.ned file.:
For the interface name parameter: We have similar code patterns (i.e. for selecting the name of the interface to be used) in Dymo and GPSR. The parameter is called interfaces
and you can use a cPatternMatcher
to iterate and match the selected interface (see the related .cc files). This approach gives the advantage that you can use pattern characters in the name of the interface and potentially also allows to support binding to more than one interface. If your implementation support only a single interface, I would still name it interfaces
and would comment there that only a single interface is allowed (as it is an implementation restriction). You can still throw errors if more than one interface name matches.
https://github.com/inet-framework/inet/blob/master/src/inet/routing/dymo/Dymo.cc#L1294
Regarding the failing fingerprint tests: If there is no obviously visible parts where unordered collections are used then I would check also for memory issues. Running valgrind memcheck may provide some tips. I would also check whether release and debug versions produce the same fingerprint. If they do, I would try to somehow reproduce the different fingerprints reliably locally (having a different fingerprint only during the github ci test makes catching this bug very hard).
Will look into this @rhornig I also have a hunch on what might be causing this. So will check it out, run the simulation and get back to you.
We will need some documentation about the routing protocol implementation in the Leach.ned file.:
Noted, will add that. Would documentation that looks like the Dymo.ned and Gpsr.ned be sufficient?
For the interface name parameter: We have similar code patterns (i.e. for selecting the name of the interface to be used) in Dymo and GPSR.
Noted, will study them and see how the code can be modified. My initial code was based on the Dsdv implementation which I think is a bit dated. https://github.com/inet-framework/inet/blob/3a440549217a5eb84d259e1e99fbeb46742edb7e/src/inet/routing/dsdv/Dsdv.cc#L72-L83
Once this is merged, I can work on updating that to match Dymo and Gpsr if its ok.
Documentation: keep it short. Refer to the original specification and mention only differences / limitations compared to that and parameters and behavior specific to this current implementation (i.e. mention possible statistics gathered, limitation (like only a single wireless interface supported etc). Think about a NEW user who is otherwise already familiar with the leach protocol (by reading the spec).
Hi @rhornig Thanks for suggesting Valgrind Memcheck, ran it via the Eclipse plugin and turns out the fingerprint anomaly was due to an uninitialized variable which I've fixed in https://github.com/inet-framework/inet/pull/929/commits/57c36ed7279cd4217baafa52ec743ee7f37fe56f.
I also updated the network interface configuration in https://github.com/inet-framework/inet/pull/929/commits/9c05df0704f0f4c751ed6b97ca43bb8f2cf055b2. It seems the current DSDV implementation has been slightly tweaked from the older version I used in INET 4.2.5, so took inspiration from that + the code pattern you mentioned in https://github.com/inet-framework/inet/pull/929#issuecomment-1880960597.
The docs are also included in the NED file in https://github.com/inet-framework/inet/pull/929/commits/6947423ec1ba8f7c635624dcdd005f9bd0978bc7. Hope this is sufficient.
Hi @rhornig, kind follow-up on this.
Finally we did our homework and looked into the code-base (apart from my previous formal comments) and also went through the referenced paper.
My most important question: What is the reason you want to integrate this PR into INET? There are valid reasons to add new code to INET, but we must be aware of the goal before we go further.
At the moment, the code is not generic enough to be used by an INET user and lacks several essential features described in the paper.
Hi @rhornig, I appreciate this feedback very much. The answers to your questions are as follows:
What is the reason you want to integrate this PR into INET?
LEACH is a popular and seminal routing protocol that is still being cited despite the original protocol being proposed over 20 years ago as seen from its many variants. While researching at the start of the project, I found a distinct lack of open-source LEACH simulation models that were up to date with recent versions of their respective simulation tools, if the code was made available at all that is. An example is Solar-LEACH from the OMNET model catalog that was submitted in 2006 for OMNET v3. This has also been mentioned by others in the community as per the responses I've received in the Google Group. Several people who were having the same problem and were searching for this code also reached out to me on my personal repo.
Since INET already has several multi-hop routing protocols and the community requirement I've observed about a LEACH simulation model, I thought it would be beneficial to contribute LEACH directly to the INET framework which can be further refined with feedback from the project Maintainers rather than just existing as a subpar community model.
To make sure, that it works correctly, it must be validated either against the paper's results (by reproducing their results) or against an other open-source implementation (carrying out the same experiments).
As mentioned above, I've struggled against this myself as I couldn't find a suitable existing simulation hence why I wrote this OMNET model from scratch. There seems to be a community-developed MATLAB project, however that seems be a partial implementation.
At the moment, the code is not generic enough to be used by an INET user and lacks several essential features described in the paper.
Noted. Is it possible to develop these features in the same PR? I would very much like to continue working on this and add the remainder with your feedback when it comes to aspects like the data aggregation module and statistic collection.
Great. If your goal is to create a model that could be used by other community members and you are willing to put the effort into this, we could proceed with this PR. I can assist you to achieve this (though obviously I cannot guarantee the end result). We kind of went the wrong way starting this PR by discussing formal stuff and implementation details first, but first general concepts should be discussed, so here is my recommendation:
So I start with 1. Here is my recommendation: Pick a version of the article (you already did this) and set a goal to reproduce the results of that article (this is in itself a worthy and publishable goal). So in this case, we want to reproduce all charts and statistics the original paper provided. This can be seen as a way to validate the model and also could be used as a baseline for other people who want to develop new protocols. It would also give guidance about necessary statistics, configuration parameters etc. Let's say we want to compare direct NODE to BaseStation communication's to Leach. Compare the energy usage and the battery depletion distribution over time / space. The current paper already defines that.
What do you think?
Sounds good @rhornig! I consider this a research project, so I'm perfectly fine with this result-based approach with your mentorship. 😄
So in this case, we want to reproduce all charts and statistics the original paper provided. This can be seen as a way to validate the model and also could be used as a baseline for other people who want to develop new protocols.
Understood. I assume this implementation is to be done with the inclusion of the generic model aspects you mentioned in the earlier comment (user installable application etc), yes? I will quickly resolve the formal implementation fixes you suggested and then move into adding the generic components and statistic gathering.
but first general concepts should be discussed, so here is my recommendation:
Noted. Additionally, do you think it would be best to maintain separate correspondence via a dedicated mail thread perhaps? I felt so as this project has sort of expanded beyond just this PR, so adding all replies here could end up spamming this PR discussion, which can be reserved solely when code changes are involved.
Great. I'm busy with omnetpp 6.0.3 release right now, but will look into the code whether I still have some formal recommendations to improve the current codebase.
As for the communication platform, I would prefer staying here in the PR, for a few reasons:
i.e. it must work with any UDP application already available in INET)
@rhornig, I'm currently working on this and need a bit of clarification. Is it required that all packet types (control, acknowledgment, etc.) used in LEACH be UDP packets or just the packets with sensed data?
I feel like the former (all packets UDP) is the way to go because otherwise, it's going to get confusing when filtering packets in the processMessage
method.
I can drop the setPacketType
use and instead use the setKind
to perform the filtering.
Let me know if this is the ideal approach.
@rhornig, I implemented UDP packets in https://github.com/inet-framework/inet/pull/929/commits/6008a9a3634e3c9dc9ba18ac357da8dc9adec5ec. Appreciate some feedback on whether this approach is correct so I can replicate it for all packet transmissions https://github.com/inet-framework/inet/blob/6008a9a3634e3c9dc9ba18ac357da8dc9adec5ec/src/inet/routing/leach/Leach.cc#L282 https://github.com/inet-framework/inet/blob/6008a9a3634e3c9dc9ba18ac357da8dc9adec5ec/src/inet/routing/leach/Leach.cc#L358
Hi @rhornig, I started finalizing the UDP packet support, but an issue with my test simulation parameters is currently holding me back. Instead of just writing it here, I've created a new question in the discussion section to benefit others: https://github.com/inet-framework/inet/discussions/976.
I would really appreciate some feedback on this so I can quickly finalize the rest of the protocol.
Hi @rhornig
I started finalizing the UDP packet support, but an issue with my test simulation parameters is currently holding me back. Instead of just writing it here, I've created a new question in the discussion section to benefit others: https://github.com/inet-framework/inet/discussions/976.
I made multiple attempts to solve the simulation issue based on the showcases, tutorials and examples in INET but had no luck. I even checked out to my code to one of earliest commits (https://github.com/inet-framework/inet/pull/929/commits/67c97be16b8eaf83466d63ab1ff8e4fb54cb9cfa) assuming the issue must be resulting from one of the subsequent changes I've made based on the review comments, but it made no difference. This same codebase works fine in INET 4.2.5 (on OMNET v5).
I set debug points and identified that the CH election in the protocol works as expected, reaching the packet transmission point without any exceptions or issues: https://github.com/n-jay/inet/blob/f4a4040557663d453a91fb43b81282c27cef8166/src/inet/routing/leach/Leach.cc#L307
But the issue is that the Non-CH nodes seem to be receiving the sent packets, and the debug points set within the processMessage
method are not triggered.
https://github.com/n-jay/inet/blob/f4a4040557663d453a91fb43b81282c27cef8166/src/inet/routing/leach/Leach.cc#L130
Would greatly appreciate some feedback on this because this is a major blocker preventing me from finalizing the protocol.
Would greatly appreciate some feedback on this because this is a major blocker preventing me from finalizing the protocol.
@rhornig, happy to inform that the issue is resolved with commit https://github.com/inet-framework/inet/pull/929/commits/162a92b215562d49d140714e8c58fd2d0664939c! Described the fix in the discussion comment: https://github.com/inet-framework/inet/discussions/976#discussioncomment-9955669. Will now proceed to finalize the rest of the protocol quickly and update you here.
Adds the LEACH protocol to existing routing protocol selection.
Addresses issue https://github.com/inet-framework/inet/issues/928