openthread / ot-ns

OpenThread Network Simulator, a discrete event simulator and visualization tool for Thread networks.
https://openthread.io
BSD 3-Clause "New" or "Revised" License
61 stars 23 forks source link

Task: integrating the new OTNS2 code into OTNS main branch #528

Open EskoDijk opened 3 months ago

EskoDijk commented 3 months ago

This issue is created to coordinate the inclusion of the new OTNS2 code (from https://github.com/EskoDijk/ot-ns) into the OTNS project main branch.

Possible methods to update: (nr 2 looks best perhaps, nr 1 is also fine)

  1. PR that is squash-merged into main (and loose all the detailed commits that led from OTNS to OTNS2)
  2. PR that is normal-merged into main (keep all the detailed commits history - even though some are a bit sloppy)
  3. push new code directly into main branch

Important goals of first inclusion of new code:

Less important goals for first inclusion -- these could be addressed later with PRs and fixes if needed.

Important design aspects of OTNS2 (to be evaluated here if okay to include!):

  1. OTNS2 supports simulation of Thread 1.1, 1.2, 1.3 legacy nodes by means of including as subproject a few repository-branches that have the right code for this. This means the number of subprojects now increased from 1 to 4. The nice benefit of this design (using separate subprojects) is that as part of CI of OpenThread, some simulations can be added where the new OT node has to interoperate with legacy OpenThread nodes.
  2. The project now includes C/C++ code with a custom platform for RF simulations. This is in the directory ot-rfsim. This code is not intended to replace, or compete with, the simulation platform code in the openthread repo. In the future some merge/combine might be considered, but it's rather complex to do this already now. Note that the co-development of OTNS simulator code and C/C++ platform code in single repo, and single PRs/commits, has been very beneficial so far - usually some simulator features requires some handling code on the platform side and vice versa.
EskoDijk commented 3 months ago

@jwhui Here's the kick-off to integrating OTNS2 into the OTNS repo in OpenThread. It would be nice if you could check the above proposal, and let me know if anything is missing!

jwhui commented 2 months ago

@EskoDijk , thanks for continuing to drive this forward.

I am fine with option 2 using a merge commit and retaining the individual detailed commits.

I think it also makes sense to have submodule references for different Thread versions / legacy implementations.

EskoDijk commented 2 months ago

@jwhui I've now added the draft PR as #530 . Note that this includes also all of the recent commits into main of OTNS, which were manually inserted as a commit. So effectively the PR is "rebased" to latest main.

The submodules I referred to are currently hosted in my repo EskoDijk/openthread as branches. If needed we can move those refs to branches within the openthread/openthread repo but since these don't exist there yet I've left it as pointing to my own branches. Can also be updated at some later time!

EskoDijk commented 2 months ago

There is a test PR to test the OpenThread CI "calling" OTNS2 via otns.yml. This is to check that the PR code is able to perform its CI task. See https://github.com/EskoDijk/openthread/pull/6

jwhui commented 2 months ago

The submodules I referred to are currently hosted in my repo EskoDijk/openthread as branches. If needed we can move those refs to branches within the openthread/openthread repo but since these don't exist there yet I've left it as pointing to my own branches. Can also be updated at some later time!

@EskoDijk , can we simply point the submodules to specific commits in the main openthread/openthread repo?

EskoDijk commented 2 months ago

@jwhui We could do that, but the specific commit does require a minor patch for all cases (1.1, 1.2, 1.3). The patch is shown here for example for 1.3: https://github.com/EskoDijk/openthread/commit/f271e322265de9ac19be2f5e047980c4cf8af952

What the legacy-versions build script could do is just apply the patch via a small .patch file. In that case we don't need to maintain any branches. I'm okay with either a branch solution or a patch! Maybe there's another workaround for the problem that the patch solves? (This is: newer Apple Clang compiler is very picky about its code and generates more warnings. The older code fails on that.)

Btw the PR with the CI task seems to work for otns.yml. The only "skipped" steps are coverage-uploads, which for some reason always fail (no security key set?).

EskoDijk commented 2 months ago

The question was raised by @wgtdkp on PR #530 if we could split the single PR into features, to allow a more meaningful code review. This is certainly possible but raises some questions:

  1. Splitting won't reduce the total volume of code (in the end) - is it still feasible to review this volume of code? (We could allow a longer time period for this like 1 year to get all the PRs in, if that helps)
  2. Do we want to have a working set of (unit) tests after each PR? (I assume yes)
  3. Do we want a working OpenThread CI (via otns.yml) after each PR is applied? Or is it fine if OpenThread CI otns.yml fails, we discover it post-merge, and fix as we go? Latter would be easier to do.

Another alternative is looking at individual commits - some of these are single squash commits coming from a PR with a particular feature. But there are also a lot of intermediate "work in progress" commits, which are easy to review/follow, but only make sense in combination with other commits that work on the same feature. And this includes also code that is later on removed again. So the alternative approach would have a larger code volume even due to many refactorings that happened along the way.

Finally, there's the approach of doing a review at a glance of everything and just accepting as is.

wgtdkp commented 2 months ago

Finally, there's the approach of doing a review at a glance of everything and just accepting as is.

I am okay with this if you could add tests for the new features in OTNS2:

Note: this is version 2.x of OTNS. It offers additional features compared to version 1:

- Support for more accurate RF simulation of OpenThread nodes. This uses the OpenThread platform `ot-rfsim`, which specifically supports RF simulation for OT nodes. This C code is included.
- Selectable radio (RF propagation) models with tunable RF parameters.
- Runtime tunable radio parameters on each individual OT node. For example, CSL parameters or Rx sensitivity.
- Control of logging display from OT-node, using `log` and `watch` CLI commands. Logging to file per OT-node. The logging output can include any enabled OT-node log items.
- Detailed logging options for RF operations (at log-level 'trace') performed in the simulated radio, at 1 us resolution.
- See packets in flight: animations in the GUI with a duration scaled to the actual time duration of a packet in flight (works at low simulation speed only).
- Support for easily adding various Thread node types (1.1, 1.2, 1.3, 1.4, 1.4 Border Router).
- New graphical displays for overall node type statistics, and energy usage (beta - contribution by [Vinggui](https://github.com/Vinggui)).
- Extended set of Python scripts for unit testing, examples, and case studies.
- Key Performance Indicators (KPI) module that tracks counters and statistics for all nodes.
- Loading/saving of network topologies in YAML files.
- Custom startup scripts with OT CLI commands, defined in a YAML file.
- Additional ["WPAN-TAP"](https://exegin.com/wp-content/uploads/ieee802154_tap.pdf) PCAP format that captures channel information.
- Various UI look & feel improvements.

So that I can review the tests only.

Going forward, I think we shouldn't develop in such a way that it diverges so much and comes back with a few hundreds commits. We suffered this in ot-commissioner where the multi-network functionality is added to ot-commissioner cli in a single CL with more than 10K code. I gave up doing code review for it so we got something like this merged into ot-commissioner: #define public private.

EskoDijk commented 2 months ago

@wgtdkp Sounds like a good idea to start with the test code. All the current feature tests are in the scripts in pylibs/unittests. I can check if everything is there. Notably, I know there are no tests for the "energy usage" functions that I integrated in order not to lose the code forever, but never got around to testing it and documenting it. I've marked this feature as "Beta". If needed we can do in a next PR an update to mark the feature with a warning or so, until it's properly tested.

There's also a test file test_ccm_commissioning.py which is not invoked currently, but is present as a placeholder to add this feature in the future. In script/test you can see the invoked Python scripts.

define public private

Ouch! At least it should be safer than #define private public ;-)

wgtdkp commented 2 months ago

Ouch! At least it should be safer than #define private public ;-)

Oh, it is actually really #define private public. See here https://github.com/openthread/ot-commissioner/blob/64928aea7c2fdcd6efe646bdcd6638758390c59b/src/app/cli/job_manager_test.cpp#L45

EskoDijk commented 4 weeks ago

@wgtdkp Are you okay with using / reviewing the current set of tests for the new features as included already in this PR? (And leave any remaining tests for the "energy" feature for a future PR.)

@jwhui I recently reconsidered the proposal (see earlier comments) of using a .patch file to make small changes to specific historic code commits used to build 1.1, 1.2, 1.3 etc nodes. This has the disadvantages that patching brings the submodules into a Git "dirty" state which is normally unwanted. A branch is much cleaner in this respect. So is it ok to use a branch for these particular submodules?