gsautter / goldengate-imagine

Automatically exported from code.google.com/p/goldengate-imagine
Other
1 stars 0 forks source link

provenance of GoldenGATE Imagine library dependencies #926

Open jhpoelen opened 2 years ago

jhpoelen commented 2 years ago

hey @gsautter -

As I was trying to better understand GoldenGATE Imagine, I found the following library dependencies -

goldengate-imagine$ ls -1 lib/ BibRefUtils.jar EasyIO.jar GamtaFeedbackAPI.jar GamtaImagingAPI.jar Gamta.jar GoldenGATE.jar HtmlXmlUtil.jar icepdf-core.jar ImageMarkup.bin.jar ImageMarkup.jar ImageMarkupOCR.bin.jar ImageMarkupOCR.jar ImageMarkupPDF.bin.jar ImageMarkupPDF.jar mail.jar StringUtils.jar

Can you help understand the origin (or provenance) and versions of these Java ARchives (JARs) or libraries?

Which ones are under your control and which ones are managed elsewhere ?

gsautter commented 2 years ago

Sure ... most of them are our own, actually:

This project hierarchy somewhat reflects Plazi's overall software architecture, at least the desktop application side of it, as the idaho- repos also form the basis for our server back-end and web front-end.

This will give you a blank GoldenGATE Imagine (display facilities, document IO, PDF decoder), most of the functionality comes from plug-ins (from https://github.com/gsautter/goldengate-plugins and https://github.com/gsautter/goldengate-imagine-plugins) and the resources they provide (a subset of the available plug-ins and resources is dubbed a configuration for GoldenGATE Imagine), which ensures extensibility. You can download the whole package as in use throughout Plazi from https://tb.plazi.org/GgServer/Downloads/GgImagine-Default.imagine.zip , ready to go soon as you extract it into a folder, no installation required apart from a JRE (1.8 recommended).

jhpoelen commented 2 years ago

@gsautter very neat! Thanks for the detailed overview. Your overview helps me better understand the beating (robot) heart of Plazi. Naturally, it is the humans that keep these robots alive.

Would it be fair to assume that a command-line (or headless) version of GoldenGate Imagine is used to do batch processing?

PS I am now used to maven, and seeing your use of "ant" was like meeting an old friend ; )

gsautter commented 2 years ago

Would it be fair to assume that a command-line (or headless) version of GoldenGate Imagine is used to do batch processing?

There is ... simply call GgImagineBatch.jar (also included in the download).

PS I am now used to maven, and seeing your use of "ant" was like meeting an old friend ; )

Well, yeah, I've been accused of being "so yesterday" occasionally, but to me, Ant is simply more transparent (no transitive dependency resolution), and also has the ability to build more than one JAR from a project ... without the latter, I would need to have far more than those two dozen or so GitHub repos to achieve the same build granularity, and said granularity is important to ensure flexibility ... after all, the idaho- repos are the basis of a lot more than only GoldenGATE Imagine.

jhpoelen commented 2 years ago

I appreciate you stick to using your preferred tools: Ant, make and other tools are trusty workhorses. I agree that maven's dependency management can be a bit much.

Neat to see that you are using the same application to do batch processing.

Btw - for some reason, I was unable to open a locally saved pdf from https://europeanjournaloftaxonomy.eu/index.php/ejt/article/download/1809/6955/ in GoldenGate Imagine . . . but was unable to select it from the "file" > "open document"

Have you ever run GoldenGate Imagine on ubuntu/linux?

Screenshot from 2022-06-08 12-51-50 Screenshot from 2022-06-08 12-56-25

gsautter commented 2 years ago

I appreciate you stick to using your preferred tools: Ant, make and other tools are trusty workhorses. I agree that maven's dependency management can be a bit much.

Exactly ... I like what works, and in a controllable fashion, not what's fancy and new ...

Neat to see that you are using the same application to do batch processing.

Well, the logic gets handed a document and does its thing, without caring where that document comes from ... that's why you separate UI and document IO from the processing logic ... way easier to debug this way, too, as you can test the logic in a visual environment and inspect the results, and be sure it does the very same thing when run without the UI, i.e., in batch mode.

Btw - for some reason, I was unable to open a locally saved pdf from https://europeanjournaloftaxonomy.eu/index.php/ejt/article/download/1809/6955/ in GoldenGate Imagine . . . but was unable to select it from the "file" > "open document"

Have you ever run GoldenGate Imagine on ubuntu/linux?

Not so far ... I run Win7 on my machine, others use Win10 or MacOS X, even one WinXP, but I've never seen the file dialogs on Linux except for one brief test on Arch ... maybe it's as simple as the file type filter, though ... just pull the file dialog a good bit higher, as those PDF decoding options on the right do take up some vertical space, and might thus push the file filter way down out of sight.

jhpoelen commented 2 years ago

I enlarged the dialog box and found the drop down you referred to.

Thanks for pointing this out!

I was able to import https://europeanjournaloftaxonomy.eu/index.php/ejt/article/download/1809/6955/ (see attached pdf) in about 5-10 minutes or so.

1809-Article Text-7919-1-10-20220606.pdf

Screenshot from 2022-06-08 15-35-49

Screenshot from 2022-06-08 15-36-04 Screenshot from 2022-06-08 15-44-08

gsautter commented 2 years ago

This looks pretty good for starters ... only thing to jump my eye is the downward offset of "Indonemoura" in the title ... should have little influence on processing, though. The top-most dozen or so options in the "Tools" menu pretty much represent the processing batch, in top-down order. Apart from that, feel free to click and edit around. You can select either a rectangle or a sequence of words, and the context menu comes up soon as you let the mouse button go. Be aware: the functions the context menu offers depend upon what you have displaying (the checkboxes down the right edge of the window), as a means to adapt to what you are working on while avoiding overcrowding the context menu.