Strange fragments generated

LiliKotlerman commented 10 years ago

Putting it here, just not to forget till Vivi is back.

I see some strange fragments generated while running ExperimentNICE (most of the fragments are OK). The fragments are below. I spotted similar ones when checking transitive closure edges I added to WP2 datasets.

I must be using gold standard fragment graphs, so maybe this is due to problems in GS, or GS modifiers are not removed correctly when constructing FGs. Could you take a look?

i've missed my c h moonport (complete statement: i've missed my connection through moonport)

why are we being penalized by having to make a choice s yet (complete statement: why are we being penalized by having to make a choice now when you guys haven't even open up the flights yet)

quality of food at t ar are very low standard food at t ar are very low standard (complete statement: quality of food at the bar are very low standard)

LiliKotlerman commented 9 years ago

OK, so after all the changes we still have this problem, let's try resolving :)

Here are the problematic fragments I spotted by now (for ENG data): EMAIL0001: (A trolley serving) snacks would be a we economy section (228464.txt_3.xml.graphf3output) EMAIL0030: Food is more ame every journey (445533.txt_2.xml.graphf2output) EMAIL0140: possibility to close the lights morning (3.txt_2.xml.graphf2output - here maybe smth is not OK in the FG xml)

I put this one here as well: In EMAIL0320 expected fragment is not generated "Improve the space at Frowntown" is there, but "Improve the space" is not (473384.txt_2.xml.graphf2output)

If I spot more, I'll put them here. It might be that re-annotation activities harmed the FG xmls somehow, but I don't see that the above xmls are problematic.

vnastase commented 9 years ago

I think this is caused by the way the CAS is built -- within the CAS, all annotations should be relative to the SOFA -- by this I mean that all beginning and end positions of an annotation should be relative the the SOFA. The problem in the signalled files is that the modifier annotations are positioned relative to the annotated fragments instead. Here is the example from file 228464.txt_3...:

The SOFA is "A trolley serving coffee tea and snacks would be a welcome addition to the economy section . "

The (non-contiguous) annotated fragment is "A trolley serving snacks would be a welcome addition to the economy section "

There are two modifier annotations, one with start-end positions: 0-17, the other 53-75. But the 53-75 position is relative to the annotated fragment, but should be (in my opinion) relative to the SOFA, and thus 68-90.

If you think it is correct to have modifier annotations relative to the fragment annotation and not relative to the CAS's SOFA, then I can change the FragmentGraphGenerator, but I think we should change the way these annotations are added to the CAS (so I would change a little bit the WP2 data converter).

LiliKotlerman commented 9 years ago

Vivi, thanks! Indeed, I think changing the data converter is better. If I understand right, we have 2 different situations: (1) our input is interactions and then what is currently done is OK and (2) when the input is fragments, and then smth should be changed. So I guess only the part generating of per-fragment xmis should be changed, right?

vnastase commented 9 years ago

Hi Lili

I think it's the other way around -- when we have per-fragment XMIs, then the text (SOFA) is the text of the fragment, so the modifiers are annotated related to that one and everything is fine. When we do a per-interaction XMI, then the text (SOFA) is the text of the interaction, and the modifier should be annotated relative to that.

Vivi

On Fri, Oct 3, 2014 at 2:40 PM, LiliKotlerman notifications@github.com wrote:

Vivi, thanks! Indeed, I think changing the data converter is better. If I understand right, we have 2 different situations: (1) our input is interactions and then what is currently done is OK and (2) when the input is fragments, and then smth should be changed. So I guess only the part generating of per-fragment xmis should be changed, right?

— Reply to this email directly or view it on GitHub https://github.com/hltfbk/Excitement-Transduction-Layer/issues/242#issuecomment-57791450 .

Dr. Vivi Nastase

Human Language Technologies Research Unit Fondazione Bruno Kessler Via Sommarive 18, 38123 Povo - Trento (Italy) nastase@fbk.eu

LiliKotlerman commented 9 years ago

Vivi, but I am only using perFragment xmis as input and then I get those fragments. Or maybe I THINK that I'm using perFragment ones, while I'm actually using perInteraction ones... strange, I generated them myself and I believe I was using the perFragment method from the data converter... I will only be able to check this on Monday, sorry

vnastase commented 9 years ago

Then maybe for the per-fragment XMIs the SOFA is still the interaction text. If we change the building of these XMIs such as the SOFA is the fragment text, all should be fine. Or change the modifier annotations. Either should work.

Vivi

On Fri, Oct 3, 2014 at 3:35 PM, LiliKotlerman notifications@github.com wrote:

Vivi, but I am only using perFragment xmis as input and then I get those fragments. Or maybe I THINK that I'm using perFragment ones, while I'm actually using perInteraction ones... strange, I generated them myself and I believe I was using the perFragment method from the data converter... I will only be able to check this on Monday, sorry

— Reply to this email directly or view it on GitHub https://github.com/hltfbk/Excitement-Transduction-Layer/issues/242#issuecomment-57796818 .

Dr. Vivi Nastase

Human Language Technologies Research Unit Fondazione Bruno Kessler Via Sommarive 18, 38123 Povo - Trento (Italy) nastase@fbk.eu

vnastase commented 9 years ago

I was a little bit wrong about the cause of the wrong positioning of the modifiers -- they are positioned taking into account the beginning of the fragment relative to the original text, but when there is a gap (as in the example from interaction 228464.txt, fragment 3), the length of the gap is not taken into account. That is now fixed.

hltfbk / Excitement-Transduction-Layer

Strange fragments generated #242