gsautter / goldengate-imagine

Automatically exported from code.google.com/p/goldengate-imagine
Other
1 stars 0 forks source link

article does not process: zootaxa zootaxa.4346.1.1 #343

Open myrmoteras opened 6 years ago

myrmoteras commented 6 years ago

https://drive.google.com/a/plazi.org/file/d/13PBDcGRLwjTK9z-MBv_TMTAlpeSnjg94/view?usp=sharing this monograph stopped here

GgImagineBatch.20171109-1342.err.log

the log is too big to past here, the original file is 28MB

I run it twice with the same result. stopped it after the longest 1.5hourse batch processing

Assembling taxonomic names Expanding abbreviated genera and species Linking taxonomic names with non-catalog genera to families Linking taxonomic names to catalog data Getting taxonomic names Adding document authority for original names Adding or resetting verbatim authorities Bucketizing taxonomic names Merging equal buckets Merging compatible buckets Handling new combinations and status changes Transferring attributes already present in document Sorting out done-with buckets Loading authority data for 100 taxon names Storing document data Storing page data Storing word data Storing region data Storing annotation data Storing font data Storing page images Storing page image data Storing supplement data Storing supplements Document stored to temporary folder Running Image Markup Tool 'Clean Table Annotations' Storing document data Storing page data Storing word data Storing region data Storing annotation data Storing font data Storing page images Storing page image data Storing supplement data Storing supplements Document stored to temporary folder Running Image Markup Tool 'Remove Duplicate Annotations' Storing document data Storing page data Storing word data Storing region data Storing annotation data Storing font data Storing page images Storing page image data Storing supplement data Storing supplements Document stored to temporary folder Running Image Markup Tool 'Mark Treatments (Headings Only)' Wrapping document Checking document Loading document processor Storing document data Storing page data Storing word data Storing region data Storing annotation data Storing font data Storing page images Storing page image data Storing supplement data Storing supplements Document stored to temporary folder Running Image Markup Tool 'Extract Materials Citations' Wrapping document Checking document Loading document processor Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations Setting materials citation attributes Detecting materials citation paragraphs Getting document person names Marking materials citations

D:\GoldenGateImagine20170823>java -jar -Xmx10240m GgImagineBatch.jar "DATA=E:\diglib\zootaxa\temp" DT=D CACHE=./BatchCache FM=R Loading parameters GoldenGATE Imagine core created, configuration is Default.imagine Image Markup Tool 'StructureDetector' loaded Image Markup Tool 'MetaDataAdder' loaded Image Markup Tool 'KeyHandler' loaded Image Markup Tool 'ParseBibliography.imTool' loaded Image Markup Tool 'MarkBibRefCitations.imTool' loaded Image Markup Tool 'MarkTaxonNames.imTool' loaded Image Markup Tool 'TableAnnotCleaner' loaded Image Markup Tool 'RemoveDuplicateAnnots' loaded Image Markup Tool 'TreatmentTaggerStyled.imTool' loaded Image Markup Tool 'ExtractMaterialsCitations.imTool' loaded Image Markup Tool 'CheckAnnotNesting' loaded Processing document 'E:\diglib\zootaxa\temp\zootaxa.4346.1.1.pdf'

D:\GoldenGateImagine20170823>

gsautter commented 6 years ago

Looks like a hangup to me, as the error log is empty ... will check.

gsautter commented 6 years ago

Could you maybe truncate the log down to the last MB or so and upload that? Maybe from where it says "document restored from previous batch run"?

That would be vastly helpful, as I wouldn't have to run the whole batch process on this pretty large PDF ...

myrmoteras commented 6 years ago

Its 5.921.212 KB How can I open this? Neither notpad nor notepad++ would open it Any other idea?

gsautter commented 6 years ago

OK, that's bad ... please at least hold on to it so I can take a look when we meet up.

myrmoteras commented 6 years ago

I also keep those of 25GB....

gsautter commented 6 years ago

Looking forward to seeing those ... Once I see what exactly produces as much log output, I'll try and reduce that.

myrmoteras commented 6 years ago

Can't you just run this in batch mode and see whether you can reconstruct this megaoutput file?

gsautter commented 6 years ago

That might well run me out of disk space before getting to that size, I'm afraid ...

gsautter commented 6 years ago

Another idea would be you put the IMF without the MCs on the server, I load it from there, and then start at the MCs.

myrmoteras commented 6 years ago

So a dessert for Bern...

gsautter commented 6 years ago

Of sorts, yes ... that kind of late evening activity of mine ...