Make TEI anchors more robust - Githubissues

pelagios / recogito2

Semantic Annotation Without the Pointy Brackets

Apache License 2.0

153 stars 30 forks source link

Make TEI anchors more robust #634

Closed hcayless closed 4 years ago

hcayless commented 5 years ago

Given a from anchor like /tei/text/body/div[@id='edition-text']/div[@id='part1']/p[@id='p1']/seg[@id='seg-1.1']::146, it would be worth getting the XPath "TEI-correct", meaning getting the steps to follow TEI naming conventions. So it would be /TEI/text/body/div[@id='edition-text']/div[@id='part1']/p[@id='p1']/seg[@id='seg-1.1']::146.

CETEIcean will store the original, properly-cased, element name in a @data-teiname attribute on each element. If you can point me at the spot where you're generating the anchors, I could probably patch it to use that instead. And that would mean you won't have to paper over the case inconsistencies.

rsimon commented 5 years ago

Hi @hcayless,

ah great, thanks. The code is here:

https://github.com/pelagios/recogito2/blob/master/app/assets/javascripts/document/annotation/tei/selection/pathUtils.js

However: could this cause issues with the production instance? Unless there's a solution that can read & render both the old and the new XPaths, would I need to migrate the existing annotations that are already in our index?

Cheers, Rainer

rsimon commented 5 years ago

And that's the part where a the DOM location is restored by parsing the XPath:

https://github.com/pelagios/recogito2/blob/master/app/assets/javascripts/document/annotation/tei/selection/highlighter.js#L18-L21

hcayless commented 5 years ago

If I'm reading the code right, you're translating the path here: https://github.com/pelagios/recogito2/blob/b91d8c0cd29734d360dfdc6bd8a04cafdd60d82a/app/controllers/HasTEISnippets.scala#L19-L29. I guess it would be up to you whether to update the index or just keep doing the replacements. They'd be no-ops for correct paths, so it would do no harm apart from costing you a few milliseconds.

rsimon commented 5 years ago

Just to confirm: we're talking exclusively about the proper TEI capitalization here, right?

I'm inclined to leave the replaceAll statements in, for now. As you say, it would only add milliseconds for correct paths.

In the long run, yes, I'm all for updating the stored XPaths in the production instance, too. However, I'd need some spare time to write the script that scrolls through ElasticSearch, rewrites the paths, and updates the records (and the courage to hit "run" on that script in production ;-) I.e. if the solution allows for a transition period, that would be better ;-)

hcayless commented 5 years ago

Yes, that's right. And the transition period can be as long as you want if you leave the replacement code in. From my perspective, I just want to make it easier for external software to deal with exported annotations—they don't currently get rewritten during export.

rsimon commented 5 years ago

Ah, yes - good point! Let‘s do this then :-)

rsimon commented 5 years ago

Hi @hcayless, just pinging about this - I think you already added this in one of your last pull requests, right? If so, we can close the issue.

Cheers, R