gsautter / goldengate-imagine

Automatically exported from code.google.com/p/goldengate-imagine
Other
1 stars 0 forks source link

[Template] Missing parameters? Document Metadata > Document Reference #841

Open mguidoti opened 4 years ago

mguidoti commented 4 years ago

Hi Guido,

Two questions here too:

  1. Document reference would be the full reference for the given paper, like in the screenshot below? image

  2. The option Document Metadata > Document Reference on the template creator has more parameters on my machine than on the other four machines, and the 'last modified' date seems very similar among the computers.

See the screenshots below:

image

On my machine: image

On their machines: image

As you can see, I've some important parameters missing on their machines, like the 'area' parameters.

What do you think?

gsautter commented 4 years ago

Regarding (1): You assume correctly. Please note that to activate this, you also have to tick the respective checkbox in the root of "Document Metadata".

gsautter commented 4 years ago

Regarding (2): Unfortunately, the screenshot with "MINE" in it doesn't show the timestamp of the local configuration in the bottom configuration selector ... the list of template parameters is a simple text file, which is part of the configuration, so there has to be some discrepancy. In particular, this file is <GgiRoot>/Configurations/Default.imagine/Plugins/DocumentStyleManagerData/styleParameters.cnfg ... plaase check for differences between the machines.

gsautter commented 4 years ago

In general, please refrain from asking two things in one ticket ... this gets confusing if tickets get longer.

mguidoti commented 4 years ago

I tried to group questions on the same theme together, but I can post in different issues. No problem.

gsautter commented 4 years ago

Well, even though they came up together, (1) is about some style parameters, while (2) is about an update issue ... hardly related, I'm afraid.

mguidoti commented 4 years ago

The relationship is the type of parameter/section of the template maker. But then again, I'll follow your instructions, and feel free to tell me how to if I do this mistake again or any other one!

mguidoti commented 4 years ago

Regarding (2): Unfortunately, the screenshot with "MINE" in it doesn't show the timestamp of the local configuration in the bottom configuration selector ... the list of template parameters is a simple text file, which is part of the configuration, so there has to be some discrepancy. In particular, this file is <GgiRoot>/Configurations/Default.imagine/Plugins/DocumentStyleManagerData/styleParameters.cnfg ... plaase check for differences between the machines.

Because you asked about the timestamp for the local configs:

Mine: image

Theirs: image

But when we start GGI we always choose the Default.Imagine from the server, not the local. Not sure if this is important..?

gsautter commented 4 years ago

Choosing the one from the server and then agreeing to making it a local configuration is exactly the way to go.

gsautter commented 4 years ago

What baffles me is the timestamp difference of exatly one hour between the two machines ... is there a respective difference in the Windows clock?

Also, did you check the files? What might have happened is that styleParameters.cnfg was modified locally on the second machine, which would be preferred in an update. Simply replace it with the one from your machine, and you should be all set. Background: The list of all available style parameters "learns" about the individual parameters as their consumers ask for them, whether present in a specific the template or not. This way, GGI can assemble the list dynamically without any explicit registration or submission of parameters, which is a great advantage with regard to software architecture. On shutdown, GGI saves the list if it was modified, which may have incurred a local modification on the machine in question.

mguidoti commented 4 years ago

No, the clocks are sync (just checked).

I performed a diff on two styleParameters.cnfg (between mine and one of theirs), and I'm attaching the results here as PDF. Check lines 28-33 (from your left). The parameters for docMeta.docRef. are the same, although there are some differences between the two compared files (~highlighted in green~ indicated with blue letters on the side throughout the file).

Do you want me to copy/paste this file anyways? styleParameters-DIFF.pdf

gsautter commented 4 years ago

I just checked the metadata handler, which also extracts the document reference, and the two parameter lists in the DIFF PDF both have all the parameters said handler uses. So the conundrum shifts to the question where all the other parameters are coming from on your machine ... did you do something like "Use Selection >" on the document reference? Could be your own machine "learned" a few parameters via a stretch of generic extraction code that are not actually used for the document reference ...

mguidoti commented 4 years ago

Answering you: I don't recall using the Use Selection > for the document reference on my machine...

If you need anything else from us here, to go deeper into this, just let me know!

One question: what's the correct/most updated set of parameters? Mine or theirs?

Thanks for putting time into it.

gsautter commented 4 years ago

Use Selection > or any of the other Use ... > items in the context menu might have done this, and frankly, I have no idea how this might have happened otherwise ... in any case, you might want to revert to the styleParameters.cnfg from one of the other machines, but the next update would replace your file anyway, as the latest always wins in an update.

mguidoti commented 4 years ago

Ok,

But their version is lacking the parameters that defines the area, for instance. Isn't this type of parameter something that we would want to keep?

gsautter commented 4 years ago

Nope ... document reference extraction doesn't use an area.

And here is why: more often than never, the document reference is below the abstract and thus can float between the lower half of the first page and all of the second page (depending on title length, number of authors and respective affiliations, and length of the abstract), which obsoletes using an area as any kind of constraint simply because the document reference can be anywhere on the page it is on. Instead, extraction works as follows: (1) find paragraphs whose font size is within the configured bounds (2) score each paragraph based upon how many tokens it has in common with all of the remaining document metadata (title, authors, journal name, year, DOI, etc.) and how many other tokens it contains (pretty much vanilla recall and precision) (3) use any highest scoring paragraph that scores above some 75% in precision * recall

This is much more sensible and robust than trying to nail a paragraph that can and does move around to some specific position, especially if the document reference is at the end of an article rather than somewhere near the article head.

mguidoti commented 4 years ago

Perfect.

Thanks!