merks / simrel-maven

A copy of the simrel report generated by the Oomph incubator
Eclipse Public License 2.0
0 stars 0 forks source link

Build a repository with direct-from-maven dependencies for which OSGi metadata needs to be generated #5

Open merks opened 1 year ago

HannesWell commented 1 year ago

That would be great.

Is currently Oomph's MavenBND.target meant to be that?

In general I think it would be more suitable to place such repo in the eclipse-orbit organization.

Btw. what do you think about always generating a special Header into wrapped artifacts like?

Eclipse-Wrapped-Bundle: ${mvnGroupId}.${mvnArtifactId}:${version}
merks commented 1 year ago

Do you see Eclipse-Wrapped-Bundle: ${mvnGroupId}.${mvnArtifactId}:${version} as just like a comment or might it be used in some semantic way, e.g., to specify a required capability for example?

merks commented 1 year ago

@HannesWell

There's a new version of Lucene for wrapping so I'm wondering about this header's purpose?

Bundle-Name: Bundle org.apache.lucene : lucene-analysis-common
Bundle-SymbolicName: org.apache.lucene.lucene-analysis-common
Bundle-Version: 9.6.0
Eclipse-Wrapped-Bundle: org.apache.lucene.lucene-analysis-common:9.6.0

It contains the same information as the Bundle-SymbolicName and Bundle-Version. Is that idea that multiple different-mapped jars could be tracked back to their original Maven coordinates? I guess an artifact ID cannot contains a '.' such that one could always determine the original Maven coordinates. If it can contain a '.', wouldn't it be better to use a ':' to separate the group and the artifact?

merks commented 1 year ago

@HannesWell

No one has complained about what's generated but this surely looks very wrong:

image

That's not a valid package name.

I wonder what the services are for?

image

This one has services and versions:

image

But I don't know how those things end up on the classpath properly.

I don't get a warm fuzzy feeling that these are actually correct, yet the Mylyn folks seems happy with them...

HannesWell commented 1 year ago

Do you see Eclipse-Wrapped-Bundle: ${mvnGroupId}.${mvnArtifactId}:${version} as just like a comment or might it be used in some semantic way, e.g., to specify a required capability for example?

It contains the same information as the Bundle-SymbolicName and Bundle-Version. Is that idea that multiple different-mapped jars could be tracked back to their original Maven coordinates? I guess an artifact ID cannot contains a '.' such that one could always determine the original Maven coordinates. If it can contain a '.', wouldn't it be better to use a ':' to separate the group and the artifact?

Sorry I somehow forgot half of my comment. The idea was that M2E respectively Tycho always generate such header regardless of the specified instructions, not that users would have to do it. And yes it would be better to always use a colon as separator.

Yes the idea is that you can determine reliably if a bundle was wrapped and to find the original artifact of a wrapped bundle. This could be used to avoid multiple wrappings of the same artifacts and if we want that we could implement some checks for SimRel contributions based on that. Maybe if projects would be not permitted to contribute own wrapped bundles and must only use the managed ones, the header could be used to check that or if they can use them we could check that the wrapped bundles are not contained in features and only referenced via Import-Package so that the resolver can reduce it to one wrapping or a after-the-fact analysis can check if multiple wrappings of the same artifact made it to the SimRel.

That's not a valid package name.

I wonder what the services are for?

I have not yet looked into that in depth (I can do that tomorrow), but my first guess is that those are ServiceLoader files.

No one has complained about what's generated but this surely looks very wrong:

Yes that looks wrong, I can have a look at it tomorrow.

merks commented 1 year ago

The mockito-inline was actually correct, just very odd and only needed for old versions of mockito. Also I don't think lucene-queries and lucene-sandbox are actually used or needed by Mylyn...

That leaves me to compare the Orbit versions of these remaining three things with these wrapped versions. They actually have the same odd structure (services and versions). The complex recipes really makes me concerned though that the simple recipe being used might be wrong, might need to change, and at that point it will be hard to change without a version increment:

image

I definitely don't get a warm fuzzy feeling...

HannesWell commented 1 year ago

The mockito-inline was actually correct, just very odd and only needed for old versions of mockito. Also I don't think lucene-queries and lucene-sandbox are actually used or needed by Mylyn...

Yes, since Mockito 5 the inline-mock is the default and delivered as part of the main artifact. They now have even discontinued that sub-project. So yes I agree that this should just be removed. The main mockito artifact has proper OSGi metadata.

That leaves me to compare the Orbit versions of these remaining three things with these wrapped versions. They actually have the same odd structure (services and versions). The complex recipes really makes me concerned though that the simple recipe being used might be wrong, might need to change, and at that point it will be hard to change without a version increment:

I'll look into this tomorrow/today in detail (this time really, we have public holiday 😅) and check the results.

The services entries seem to contain ServiceLoader files, so we either have to add corresponding ServiceLoader Medaitor entires or squash all the jars into one module by making all jars for example fragments of lucene-core.

What I just have noticed is that org.apache.lucene.lucene-analyzers-common is now org.apache.lucene:lucene-analysis-common according to https://github.com/apache/lucene/blob/0c6e8aec67c375f9a2f934d28ac3f4ceb5f0ebb9/lucene/MIGRATE.md#rename-of-binary-artifacts-from--analyzers--to--analysis--lucene-9562

But when already investigating the lucene project, I think it would be worth trying to make it proper bundles, as requested in

But from a quick look the maintainers seem to be reluctant and this project is build with Gradle (😒) .

merks commented 1 year ago

@ruspl-afed

FYI, I am looking to rework the recipes for all these Lucene bundles. In particular, I have modified the recipe so that they will have the same bundle symbolic names as they had in Orbit making it easier to update to them. I was trying to switch to use package imports yesterday for Mylyn and for Data Tools, but with all these different Lucene versions floating around, that is simply exercise in frustration and futility; it's great in principle but in this case, it's unworkable. So instead I want to have bundles that are an easier drop-in replacement for the Orbit bundles. I will have to switch Mylyn to use them. I am also be jar-signing these things.

I have also worked on a generator that will allow me to maintain version qualifiers for these things such that if the recipe needs to change a new qualifier version can be built that will allow updates to the new version of the artifact; more details on that when it's polished...

merks commented 1 year ago

Note that the build opened a whole whack of reviews now that the dash tool is properly kicking in again:

https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues

Some may be issues with the name not mapping directly to a Maven coordinate or with the version qualifier. It will need to get ironed out...

merks commented 1 year ago

@HannesWell

If you want to look at any of the details and try things out, all the builds and generators can be run locally from this project setup:

image

The Analyzer maintains these two files and there is a launcher to run it:

image

In particular, it maintains the SHA1 value by digesting the BND instructions (normalized and without the underlined version qualifier) along with the dependency coordinates. If the digest changes, e.g, if the instructions are changed, it updates the digest and the version qualifier. This way I can be more reassured that it's possible to build new artifacts if some important BND instruction is overlooked....

HannesWell commented 1 year ago

If you want to look at any of the details and try things out, all the builds and generators can be run locally from this project setup:

Yes I already wanted to ask, what you want me to help with in detail because I have the impression that you are already making good progress. :) I'm about to try it out.

Btw. have you already found out how to handle the ServiceLoader config well, i.e. if we need ServiceLoader-Mediator entires or use fragments?

merks commented 1 year ago

My sense is that the current equivalent Orbit artifacts have exactly the same kind of "strange" structure and no one has complained, so I'm inclined to ignore it for the time being. I've also done all these experiments with the 9.5.0 versions even though there are 9.6.0 versions. And I added a bunch of things to deal with older Orbit artifacts that are stuck at the 7.x version...

I produced this repository with jar-signed artifacts:

https://ci.eclipse.org/oomph/job/simrel/job/simrel-maven-bnd/lastBuild/artifact/maven-bnd/site/target/repository/

I wonder if we can move the platform to use these new versions? And more importantly too, I wonder if we can get the Platform to stop including a specific version because that's a nightmare for downstream clients who want newer versions of Lucene than what the Platform is strictly requiring and redistributing...

HannesWell commented 1 year ago

Looking at the code I wondered if it would be useful to use a DOM XML parser to load the model and operate on it directly instead of using RegEx Patterns?

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        var arguments = new ArrayList<>(Arrays.asList(args));
        Analyzer analyzer = new Analyzer(arguments);
        Document document = parseDocument(analyzer.target);
        Element target = (Element) document.getElementsByTagName("target").item(0);
        for (Element location : children(target, "location")) {
            Iterable<Element> features = children(location, "features");
            ...
        }
    }

    private org.w3c.dom.Document parseDocument(Path file) throws ParserConfigurationException, SAXException, IOException {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = dbf.newDocumentBuilder();
        return builder.parse(target.toFile());
    }

    private Iterable<Element> children(Element parent, String string) {
        NodeList list = parent.getElementsByTagName(string);
        return () -> IntStream.range(0, list.getLength()).mapToObj(list::item).filter(i -> i.getParentNode() == parent)
                .filter(Element.class::isInstance).map(Element.class::cast).iterator();
    }

If you set the tycho-p2-repository-plugin:includeAllSources property to true in the maven-bnd_site/pom.xml you don't have to include the source feature in the repo. The generated source feature would not be included, but the sources from all Bundles. Instead of computing a sha/date-based qualifier a simple running number would also do the job. Of course one would have to increment that manually. Besides that the generator looks good.

I wonder if we can move the platform to use these new versions? And more importantly too, I wonder if we can get the Platform to stop including a specific version because that's a nightmare for downstream clients who want newer versions of Lucene than what the Platform is strictly requiring and redistributing...

In my platform/equinox workspace only org.eclipse.help.base and org.eclipse.ua.tests seem to use it and both use proper version ranges, the test only has a lower bound, but for a Test I think this is fine and o.e.help base has org.apache.lucene.core;bundle-version="[9.4.2,10.0.0)" which is reasonable for SemVer. I only see the org.eclipse.help Feature that includes a lucene bundle, so it would probably be sufficient to remove it there. PDE and JDT seem not to use lucene. If lucene follows the SemVer rules I see no reason to require a specific version.

HannesWell commented 1 year ago

My sense is that the current equivalent Orbit artifacts have exactly the same kind of "strange" structure and no one has complained, so I'm inclined to ignore it for the time being. I've also done all these experiments with the 9.5.0 versions even though there are 9.6.0 versions. And I added a bunch of things to deal with older Orbit artifacts that are stuck at the 7.x version...

I have to look again into the bundles in details, but after inspecting the orbit receips I'm not sure if we should add the mandatory attributes to the split packages. This is definitely a difference, but the question is, if it is relevant.

Looking at the lucene recipes, I really don't understand why others are often so convinced that with Maven targets we will get too many different version, even in the current Orbit we have three versions of lucene or two versions of Guava. 😰 In the current 2023-06-M2 repo, we even have four versions of guava, three of them from Orbit 🙉 Of course multiple versions are a problem, sometimes small, sometimes big and it should be tackled, but I more and more believe that it mainly depends on projects being active and eager to adapt to/prepare for new versions and that a single one-stop repo does not automagically solve that problem. 😞

merks commented 1 year ago

Yes, it's easier to read with a DOM, but serializing and preserving format is hit and miss at best (as I see repeatedly when I edit a *.target file with the PDE editor). So I'd rather do some textual editing. ( If someone else wants to do all the work, and manage it going forward, they are of course welcome to do it as they wish. 😜 )

Here's the part of the platform that's of concern:

image

This locks in one version and most products will ship with help with that locked-in version.

We keep taking about everyone using the latest, so that must apply for the Platform too I assume...

HannesWell commented 1 year ago

So I'd rather do some textual editing. ( If someone else wants to do all the work, and manage it going forward, they are of course welcome to do it as they wish. 😜 )

Understand and agree, I think I will skip this offer for participation this time 😅

This locks in one version and most products will ship with help with that locked-in version.

We keep taking about everyone using the latest, so that must apply for the Platform too I assume...

Yes, I would be in favour of that as well. :) Bundles should define a version-range as wide as possible and as small as necessary and should not lock the version through Features. Otherwise we lose the advantages OSGi offers.

Furthermore I belive the version lock-in was not the main reason to include lucene in the Feature, probably it was only done to pull in the sources. 😅

HannesWell commented 1 year ago

I have to look again into the bundles in details, but after inspecting the orbit receips I'm not sure if we should add the mandatory attributes to the split packages. This is definitely a difference, but the question is, if it is relevant.

Thinking about it again, the attributes are not necessary if only require-Bundle used, but if one uses Import-Package, this is nessesary to distinguish the packages since they have split them.

If done in a smart way we can probably encode that in the package-exports in one instructions set with a few macros. I can try to work that out tomorrow.

merks commented 1 year ago

Actually, I think the 9.5 structure does not have split packages. One course it's hard to tell for sure, so I worked on a packages view in the analyzer editor but still could find no split packages. I did find these in the SimRel aggregation:

image

Maybe I don't understand the definition of "split package" or maybe I've overlooked something?

merks commented 1 year ago

@HannesWell

I've been experimenting to understand BND better. I installed BND tools because it really helps to see the generated jar. Here I've modified the recipe so that the package imports of other lucene packages have version ranges:

image

Of course I need to do a BND maven build to get to this "review" point, so @jonahgraham comment about iteratively editing the recipe to see what's generated does seem quite relevant. The cycle time here is relatively long. Also, the feedback on BND recipe errors is poor.

Then there's Christoph's comment about "it was never meant to be used this way". In the end though, when I use the other dependabot like *.target updater, this is very easily changed to version 9.6.0 to build new wrappers so that at least is easier.

Anyway, I assume it's a good idea to generate ranges on the packages! I have no way to know about provider/consumer relations here though...

HannesWell commented 1 year ago

Actually, I think the 9.5 structure does not have split packages.

Looks like the reworked their bundle structure with lucene 9: https://github.com/apache/lucene/blob/main/lucene/MIGRATE.md#migration-from-lucene-8x-to-lucene-90

I also checked all the lucene artifacts in your MavenBND.target manually and also find no Java package that occurs in more than one jar. Great 👍🏽 Then the Orbit receipts are probably unnecessarily complicated.

One course it's hard to tell for sure, so I worked on a packages view in the analyzer editor but still could find no split packages. I did find these in the SimRel aggregation:

image

That's very handy! And yes the split-packages in platform are known and intended, although Tom Watson mentioned recently that the expected advantages of this split were never fulfilled.

Then there's Christoph's comment about "it was never meant to be used this way". In the end though, when I use the other dependabot like *.target updater, this is very easily changed to version 9.6.0 to build new wrappers so that at least is easier.

Personally I'm torn if we should use the Maven-Targets as Orbit replacement. Using maven+bnd-maven-plugin is definitively more powerful, but often that power is not needed (as long as one only needs to generate a Manifest, the Manve-Targets can do everything) and I find Maven-targets simpler to set up and use. Furthermore I believe it also eases the migration to real OSGi artifacts (given that the original jars become OSGi-ready one day), because one already has the same artifact structure (Orbit used to merge and split artifacts). Since the lucene devs seemed to have solved the split-package problem already, the ServiceLoader usage is the only specialty I'm aware of that one should consider when generating OSGi metadata.

Anyway, I assume it's a good idea to generate ranges on the packages! I have no way to know about provider/consumer relations here though...

Yes definitely, but as far as I know is Christoph is working on generating them automatically. Btw. I'm currently investigating why changed BND-instructions are (often) not taken into account in m2e in the workspace. If this is resolved, this should also reduce your turn-around times.

In the meantime I suggest you to use the range macro to compute the Import-Package version range: version-range: ${range;[==,+);${version}} If you inline the version-range you can maybe even simplify it, since BND injects the exported package version(the ${@} mentioned in the doc): Import-Package: org.apache.lucene.*;version="${range;[==,+)}" But it could be that this does not work in this case, from my experience the injection does also not work for packages exported from bundles in the same reactor.