Don't Add Components To Classpath

greenwoodma commented 7 years ago

Scanner code currently assumes that component metadata will be accessed via the classpath (both the GATE and UIMA scanners do this) but this means that the JAR files containing the components have to be on the classpath. This might be fine when used as part of a Maven build of the component, but makes no sense when used as part of the registry. It can be difficult to remove JAR files from the classpath once added and we don't need the extra overhead.

I would suggest the scanners take a URL to the JAR file instead, and then locate the relevant files from within the JAR (by treating it as a zip file) rather than via the classpath

reckart commented 7 years ago

This is a maven plugin, not code for the registry. I don't see atm why this is necessary.

reckart commented 7 years ago

Actually, the code should be able to scan ZIP (jar) files by providing a jar:file:... url.

greenwoodma commented 7 years ago

If I remember the discussion correctly it's not just about being used as a Maven plugin though. We need code that can be used by the registry to scan components that may or may not have the OMTD descriptors in them already. I thought we'd agreed to use the same code in both a Maven plugin and the registry and if that's the case then using the classpath is dangerous (as is the UIMA scanner as it has state so each successive call will return all components scanned so far). But I could be wrong.

Certainly adding the ability to scan JAR files as well will be useful, so for now at least changing the label makes sense

reckart commented 7 years ago

Actually, this should already work:

    new UimaComponentScanner.scan("jar:file:/my/home/library.jar!**/*.xml");

reckart commented 7 years ago

If that worked, would it resolve your request?

greenwoodma commented 7 years ago

Yes an no :)

I've started to realise that in some cases we might need access to two files in parallel when scanning. In the case of GATE not all the information is in the creole.xml file some of it is only in the pom.xml (version info, developers etc.). So when scanning ideally the process should be

scanner is passed a JAR URL
scanner extracts pom.xml and analyses information common to all components in the JAR
scanner extracts creole.xml and for each component duplicates the common info adding the component specific details
scanner returns completed list of component info objects

I can't really see how you'd do that sensibly with the current scanning approach (obvious if you are inside a maven plugin as you would already have the project model to use).

reckart commented 7 years ago

It is not the intended use of the scanner to incorporate the POM information. That is what the MavenProjectAnalyzer does.

Again: this is a Maven plugin. If the registry developers later want to adopt/adapt the code from here into the registry, then some changes may be required. But IMHO these are then issues regarding the registry code.

reckart commented 7 years ago

Correcting myself: this code is presently being developed with a focus on the Maven plugin use-case and I want to get this working as much as possible. If this should also be used from the registry, then it may require some changes or alternative implementations. E.g. the current MavenProjectAnalyzer expects a proper Maven project object which contains all resolved informations about parent POMs, etc etc. If you are in the registry code, then you have to consider how to provide such information. It could be again as a Maven project object; it could be by using something to create an effective POM; or some other approach. I don't know how the registry fetches POM information. But the way that it fetches it is IMHO a concern of the registry, just like the way that the Maven plugin fetches the information is a concern of the Maven plugin.

greenwoodma commented 7 years ago

Yes the MavenProjectAnalyzer is responsible for scanning a maven pom, but how would you go about combining the information from it with that found by the GATE or UIMA scanner, they are currently separate?

Don't get me wrong, I understand your point about this code being separate from the registry and only intended for use (for now at least) within the maven plugin, but I think we have to think longer term. I believe this is now the third set of code developed in OMTD for getting component info from metadata and or classes. Surely if this version is providing a richer approach (i.e. able to extract more info), which I think is the case, then we should be proactively looking at using it throughout the project... once we actually have it working. So thinking about how it might be used and not coding ourselves into a corner seems sensible to me, even at this early stage.

reckart commented 7 years ago

First the Gate or Uima scanner creates an initial OMTD-SHARE component descriptor.
Second, the OmtdShareAnalyzer runs and adds any information that might be present in OMTD-SHARE Java annotations - it should not override info already provided by the initial descriptor, only augment it
Third, the MavenProjectAnalyzer runs and adds information that might be present in the POM - it should not override information already provided by the previous steps

reckart commented 7 years ago

I believe it's the second implementation. The other is the proof-of-concept that is part of the component overview code and which was never meant to be used in production. Also, it was based on OMTD-SHARE 1 whereas this one is aiming at OMTD-SHARE 2.

greenwoodma commented 7 years ago

ah yes, somehow, I'd got confused about when the ComponentInfo instance was created and hadn't spotted that there wasn't a Maven scanner only an analyser. So that chaining works nicely for pulling info from numerous sources :+1:

reckart commented 7 years ago

The "scanner pipeline" is currently set up in the Maven plugin code and only very crudely. That needs improvement.

Lets try to keep separate things separate. I assume that the ability to access descriptors from within JAR files (cf. issue title) is covered by

    new UimaComponentScanner.scan("jar:file:/my/home/library.jar!**/*.xml");

Let's open separate issues for other improvements.

openminted / omtd-share-annotations

Don't Add Components To Classpath #6