Open DavidBakerEffendi opened 10 months ago
Thanks for opening this discussion! Minor points I'd like to add from discord:
@bbrehm Sounds like some kind of Joern Enterprise solution is on its way soon š
To me, it sounds like anything outside joern.io, private or public, could soon become overwhelming for a contributor to consider. A test-API that companies can subscribe to could one solution for signalling a breaking change, but also cripple the speed of changes.
We could also look at a develop
and release
branch, added complexity but may help automate major/minor releases, as well as give users of Joern the option to choose the cadence/change rate they're willing to keep up with.
(An extension of the above) Automating a changelog from the descriptions of commits since the last version may help users depending on Joern debug an API change or new protocol.
This is an interesting topic, as this may also extend past JVM languages. In terms of the schema, we've had things such as cpg.proto
.
It would be good to define a level of compatibility features we're willing to maintain, and others we don't.
So much DevOps in the future of Joern...
(An extension of the above) Automating a changelog from the descriptions of commits since the last version may help users depending on Joern debug an API change or new protocol.
There is a qualitative difference between breaks that require adaptations on the minor refactoring level (example: function X has been renamed), and breaks that require major rewrites of entire components (example: it used to be possible to do low-latency writes to the graph via legacy tinkerpop odb APIs -- yeah, nope, in the medium future you need to use the cpgpass infra and batch updates or face a 1000x slowdown)
In other words, we need both a changelog for blog-post length descriptions of architectural changes, and a changelog for minor stuff. Such a thing cannot be auto-generated and we must prevent noisy minor stuff almost nobody should care about from leading people to miss the big things.
Due to the noise issue, I am quite against auto-generated change-logs. Commit messages fulfill this function already.
In other words, we need both a changelog for blog-post length descriptions of architectural change
On that note, the new Hugo-based blog component of the Joern website is nearly ready, and should enable some more accessibility on publishing updates on a more official forum.
Due to the noise issue, I am quite against auto-generated change-logs. Commit messages fulfill this function already.
You make a good point. ScalaDocs could also provide some overview of API changes across versions - and can synchronize with minor releases.
Another aspect: how do we want to agree on changes to stable APIs? I feel rigorous documentation is less important there than discussions before-hand so that no stake-holders have a bad surprise.
E.g. we could agree to announce potentially breaking changes in a dedicated discord channel, and only merge when no objections have been raised within some time frame that should be short enough to allow us to get things done but long enough that everybody has a chance to review how much they'll be affected.
About 2 weeks have now passed since the issue was raised, and I think these are reasonably the next actionable steps:
docs.joern.io
along with a protocol for deprecating a stable API.The above can be aided with a script that parses git diff
to detect if a class on a stable API is being modified. Each action below could ping the discord channel and link the PR, so that stakeholders can comment directly on it.
master
GitHub action that can do a minor version bump if some tag is detected in the commit title, suggesting a breaking change in the API, e.g. 2.0.100
ā 2.1.0
Problem
Joern does not have a fixed protocol for minor/major releases, nor is explicit on what is core/stable APIs. These issues give users who develop with Joern as an external dependency no reliable guideline on maintaining their project with the nightly release schedule.
There is no problem when an API is added, but rather, what APIs should users consider reliable.
Solution
We require both freedom to develop and modify code in Joern, while also having guidelines when touching code considered "stable". This issue is the beginning of an open dialogue on what that should look like, and get some hands on.
Stable APIs
Here is a list of paths containing classes exposing public methods/functions that I think should be considered "stable". @bbrehm suggested we whitelist these, instead of blacklist, i.e., we have a set of stable APIs, and the rest should either be considered internal or experimental.
I will update this list as the discussion proceeds.
Passes
io.joern.x2cpg.[base|callgraph|controlflow|typerelations]
io.joern.x2cpg.frontend.[TypeNodePass|MetaDataPass|Dereference]
- I consider the rest experimental/unstableio.joern.dataflowengineoss.passes.reachingdef.*
X extends X2CpgFrontend[Config]
, any frontend entry-point class name.Utilities
io.joern.x2cpg.*
(excludingpasses
, see above)io.shiftleft.semanticcpg.[layers|utils]
io.shiftleft.semanticcpg.Overlays
Query Steps
This commit has been reported as largely breaking to some users of Joern, and a regression in terms of interoperability in other JVM languages. It would be good to define some stable query items.
ExtendedCfgNode.[ddgIn|ddgInPathElem|reachableBy|reachableByFlows]
io.shiftleft.semanticcpg.language.[android|bindingextension|callgraphextension|dotgenerator|nodemethods|operatorextensions|types]
Suggested Protocol
Ideally, when a stable API is modified or removed, it should be marked deprecated and provide advice on how to migrate to the new API. This can then be followed up with a minor version bump and scheduled for removal on the next major version bump.
There is already an effort to maintain descriptions of these changes in the
CHANGELOG.md
but it has no formal protocol.Documentation
Once this issue is concluded with actionable steps, we should move the results to the documentation to publicly declare the guidelines. A nice-to-have would be a CI/CD step to flag when a stable API is touched, but this may be a bit more complicated.
Inviting the following to the discussion, however anyone is welcome to provide input: @fabsx00 @ml86 @max-leuthaeuser @pandurangpatil @tuxology @bbrehm @mpollmeier