gsautter / goldengate-imagine

Automatically exported from code.google.com/p/goldengate-imagine
Other
1 stars 0 forks source link

upload warning #27

Closed myrmoteras closed 7 years ago

myrmoteras commented 7 years ago

@gsautter here is a 250page ms following... that no got processed

Zootaxa 2456: 1–243 (2010) www.mapress.com/zootaxa/ Monograph ZOOTAXA Review of Gonatocerus (Hymenoptera: Mymaridae) in the Neotropical region, with description of eleven new species

myrmoteras commented 7 years ago

ok - this does not upload image

what do I have to do?

gsautter commented 7 years ago

Try again now, I just restarted Jetty and paused Figure Repair to free up some resources.

myrmoteras commented 7 years ago

image

restart again

gsautter commented 7 years ago

I do see a update coming in in the console ...

myrmoteras commented 7 years ago

but the source file doesn't make it.

gsautter commented 7 years ago

Not sure that is on the server end of things, though, as I see nothing like it in the console ...

I suggest you store the IMF, best to an "Image Markup Directory" as that is easiest on your local resources, close and re-open GGI, re-open the document, and then try and upload again.

myrmoteras commented 7 years ago

image

myrmoteras commented 7 years ago

using the image markup directory leads to the same result

image

gsautter commented 7 years ago

See #28 ... I suspect this is a local memory issue rather than a server side one. Does increasing maximum RAM on your end change things?

myrmoteras commented 7 years ago

Ok - normally I assign 10GB - but since I have not adjusted this when I downloaded a new version, it is ca 1GB. Let's see what's happening. I have been astonished for the sluggish behavior... now it is back on 10GB at least it now doesn't stop... but using 97%CPU for several minutes to get from 77% to 78%

gsautter commented 7 years ago

There's just shy of half a million words in this one, so adding the words to the document, loading all those attributes, and chaining this extremely long text stream does take a little while ... just a monstrous document this.

myrmoteras commented 7 years ago

so it run through, but now there is nothing to see anymore. and the GGI window, blank as it is , is frozen

gsautter commented 7 years ago

As long as your CPU load stays up, that means there is something going on ... however, displaying these 500+ pages is a pretty steep call as well. I guess once the first page shows, the last thing you should do is scroll or zoom around.

myrmoteras commented 7 years ago

moving around is not the problem. But to edit it, that is to use the tool "treatment structure" takes a long time to open - I let it run to see.

myrmoteras commented 7 years ago

how do I know, that something is still ongoing waiting for? image

gsautter commented 7 years ago

Well, running analysis logic takes wrapping the document as XML in most cases (exceptions being "Detect Document Structure" and " Document Metadata"), and creating that wrapper does take some time as well, as it involves running all along the text streams, the longest of which is nearly half a million words long in this document ...

Bottom line: as long as there is CPU load, there's something going on. Something that might be taking fractions of a second in a normal article, but is just a lot more effort in such a huge monograph.

myrmoteras commented 7 years ago

the treatments is a genus level, but not at species level.

it took ca 10 minutes to open the treatmentstructure dialogue window, but is is open now..

gsautter commented 7 years ago

Squeezing half a million words into that XML wrapper, likely along with tens of thousands of annotations, just does take a while ... complicated enough to represent a page and text stream based document to an Analyzer as the very XML it is used to, with write-through capability on top (that wrapper alone took me a few weeks to build and get right), but doing so for a document this size is yet a different issue altogether.

myrmoteras commented 7 years ago

the heading does not work, since it is not marked up - would have needed a fourth level...

image

But this document we need to get done - too much of relevant information inside, such as host plants, treatment citation, distribution data...

image

I guess I need a day or so to get it done.

gsautter commented 7 years ago

For documents like this, it might be worthwhile to provide a custom treatment heading selection step ... working on properties like (short) paragraph length and an (at least at the start) bold taxonomic name occupying at least the majority of it ... what do you think?

If treatments are numbered, we have that now, with an additional check on both sequential numbering and numbering sequence completeness as the safety nets that make it applicable in batch processing. But without the numbering, there's just very little to go at, as taxon names in bold alone are just not distinctive enough to work reliably in the general case. However, it would be quite possible to create a treatment heading tagger based upon exactly that and provide it at user's disposal for cases in which a user decides such rules are applicable. What do you think?

gsautter commented 7 years ago

I guess this one has been lingering for all too long for a ticket about a very specific (and peculiar) document ... time to lay it to rest.