pelagios / recogito2

Semantic Annotation Without the Pointy Brackets
Apache License 2.0
153 stars 30 forks source link

Customized external authority lists from Perseus #363

Open ChiaraPalladino opened 7 years ago

ChiaraPalladino commented 7 years ago

We have a number of place names and a very good amount of personal names stored as an authority list in Perseus in RDF format. Those data are pretty much unused and lying there at the moment. For personal names in particular, it would be nice to be able to associate them to the annotations in Recogito. Would this sort of integration be conceivable?

rsimon commented 7 years ago

Not only conceivable but highly desired - and planned in principle! At the moment though, I don't have any free resources to work on Recogito, so I can't say when there will be time to address this in practice. Can you post the link to the RDF data?

ChiaraPalladino commented 7 years ago

Sorry, it's actually XML at the moment. The data comes from originally printed indexes to various authors - like this one for Sallust.

http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:2007.01.0026

We have 23 similar indexes, some already in Perseus and some elsewhere. Do you think they may be something? Which further work would be needed to make them useful as authority lists (even just for specific authors)?

rsimon commented 7 years ago

The format of the data doesn't really matter. As long as there's names and URIs, accomodating different serializations/standards is pretty trivial. It's the additional functionality that needs to be built into Recogito that requires effort: i) a mechanism to upload & index the name authority list; ii) the plumbing & additional UI components needed to do the index look up and allow the user to confirm or change a match. If the name lists are supposed to be cross-referenced to other authorities (like the gazetteers, which link to each other), a bit of extra work would be needed, too.

None of this is massive amount of work; especially since for i) the existing gazetteer infrastructure can be re-used. But it still requires work & all my time is currently filled with the Peripleo redesign :-(

rsimon commented 7 years ago

In the meantime: how about the following workaround idea: you could use ordinary tags in conjunction with the person name. E.g. something like: perseus2007.01.0026:Q.-Fabius-Maxumus-ahlberg-1 or any "shortcode" that would let us reliably reconstruct the URL. (It's a pity they don't have simpler IDs though.) That way, it could work a bit like Flickr machine tagging (and we could add resolvable links to Recogito in the same way as Flickr did.)

ChiaraPalladino commented 7 years ago

This seems to me a very good idea. We could use CTS references, and it would also be a nice slow start towards some kind of cts adaptation. Some of these documents are not cts-compliant yet but if this is possible then we could try to modify them accordingly.

ChiaraPalladino commented 7 years ago

Update: we are thinking about doing the same with personal names linking to the LGPN. Should we use tags or comments for the interlinking? Maybe it's more flexible to use comments, but it's up to you.

rsimon commented 7 years ago

If the point is to tag with URIs, and we'd use a shortcode a la Flickr machine tags, I think I'd prefer this to be done via tags. We could also make these tags clickable, and then it would IMO be somewhat cleaner if Recogito would check whether a tag is a shortcode vs. whether a comment "includes" one. Or, put differently, I'd consider comments to be something completely freeform, whereas a tag is by definition a bit more of a structured thing.

ChiaraPalladino commented 7 years ago

Great - we'll do it via tags. It would be very cool if we could click on the tags and see the information that they contain.

rsimon commented 6 years ago

Hi @ChiaraPalladino,

picking this up again (after almost a year...):

the upcoming Recogito backend upgrade (coming in late Feb/early March) would allow us to do two things more easily:

  1. work with other types of authorities, not just places (i.e. people)
  2. work with external authority "endpoints", i.e. services that provide their own HTTP lookup API (in contrast to our previous requirement of having data indexed in Recogito directly)

Is there interest, perhaps, to provide such a lookup API for people URIs (and/or places) via Perseus?

Alternatively, we could still look into indexing the a dump of Perseus data into Recogito. (The new backend also makes it easier to delete authority lists, even if they were already used in annotations, in case we need to roll back.) But given that Perseus is a) a stable service and b) likely to constantly update their data, an API-based integration will IMO make more sense.

ChiaraPalladino commented 6 years ago

Tagging @jtauber here, because I think he may have a better answer than me. Although I'm sure you met before, I'll introduce you virtually anyway: Rainer, James Tauber is the developer who is in charge of the new version of the Perseus Library coming up in March. James, Rainer Simon is the developer of Recogito, the annotation tool of Pelagios.

I don't think we have people data in the texts hosted in Perseus, but we have dictionaries which we can use as external authorities. James is working to make all the different data provided through Perseus-related projects integrated into the new reading viewer, so this may be a thing worth to talk about.

In terms of integration, it would also be nice to be able to refer to Pelagios for places references. For that, there is a bunch of annotated texts in English translation that already refer to Pleiades, e.g. http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.01.0160

eltonteb commented 6 years ago

+1 to seeing a Perseus-Pelagios integration, so that places in Perseus texts could direct folks to other info. Perhaps Bridget had already made a start on this?

Can't wait to see the new Perseus interface...

elton

On 26 January 2018 at 15:01, ChiaraPalladino notifications@github.com wrote:

Tagging @jtauber https://github.com/jtauber here, because I think he may have a better answer than me. Although I'm sure you met before, I'll introduce you virtually anyway: Rainer, James Tauber is the developer who is in charge of the new version of the Perseus Library coming up in March. James, Rainer Simon is the developer of Recogito, the annotation tool of Pelagios.

I don't think we have people data in the texts hosted in Perseus, but we have dictionaries which we can use as external authorities. James is working to make all the different data provided through Perseus-related projects integrated into the new reading viewer, so this may be a thing worth to talk about.

In terms of integration, it would also be nice to be able to refer to Pelagios for places references. For that, there is a bunch of annotated texts in English translation that already refer to Pleiades, e.g. http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.01.0160

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pelagios/recogito2/issues/363#issuecomment-360807619, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOm5fcKykEu5T9KCWJVf_XPg3-1jeckks5tOejPgaJpZM4Lu4ED .

--

Dr Elton Barker | COMMUNITY DIRECTOR, PELAGIOS COMMONS

http://commons.pelagios.org/ | @Pelagiosproject

jtauber commented 6 years ago

We have some place date in, I think, some English translations (e.g. Herodotus) and we could link out based on that.

I've also been thinking of a generalised API (perhaps based on W3C web annotations) for services to be informed about and queried on references whether they be people, places, or passages.

In the case of passages, the way it would work is to either be able to say "hey, I have said something about [CTS URN] over here" or "hey, what do you have to say about [CTS URN]?" but exactly the same model could apply to anything we have URNs for including places and people (and vocabulary items, etc). This is the way Greek vocabulary glosses and morphology work (we have a vocabulary service, for example, that can be queried based on CTS URNs)

The new Perseus interface repo is https://github.com/scaife-viewer/scaife-viewer/ and the staging server is at https://lk353.eu1.eldarioncloud.com

Neither address has been broadcast yet but I don't think it's a problem sharing them here :-)

The Greek vocabulary tool (which is partly Perseus work and partly my own work) is staged at https://gu658.us1.eldarioncloud.com (and that's what's also offering the API used by the Scaife Viewer)

eltonteb commented 6 years ago

We have some place date in, I think, some English translations (e.g. Herodotus) and we could link out based on that.

If that's the case, given the fact that Recogito now handles TEI texts, I'm wondering whether it might be worth inviting people to provide Pelagios annotations for texts of their choice, to then hand back to Perseus for general consumption? We could even make it a thing, as a way of advertising both resources....

The new Perseus interface repo is https://github.com/scaife- viewer/scaife-viewer/ and the staging server is at https://lk353.eu1. eldarioncloud.com https://lk353.eu1.eldarioncloud.com Thanks!

On 26 January 2018 at 19:25, James Tauber notifications@github.com wrote:

We have some place date in, I think, some English translations (e.g. Herodotus) and we could link out based on that.

I've also been thinking of a generalised API (perhaps based on W3C web annotations) for services to be informed about and queried on references whether they be people, places, or passages.

In the case of passages, the way it would work is to either be able to say "hey, I have said something about [CTS URN] over here" or "hey, what do you have to say about [CTS URN]?" but exactly the same model could apply to anything we have URNs for including places and people (and vocabulary items, etc). This is the way Greek vocabulary glosses and morphology work (we have a vocabulary service, for example, that can be queried based on CTS URNs)

The new Perseus interface repo is https://github.com/scaife- viewer/scaife-viewer/ and the staging server is at https://lk353.eu1. eldarioncloud.com

Neither address has been broadcast yet but I don't think it's a problem sharing them here :-)

The Greek vocabulary tool (which is partly Perseus work and partly my own work) is staged at https://gu658.us1.eldarioncloud.com (and that's what's also offering the API used by the Scaife Viewer)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pelagios/recogito2/issues/363#issuecomment-360880478, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOm5crkIbK-JjGZl58m6h_Da0bMhZ0Aks5tOia6gaJpZM4Lu4ED .

--

Dr Elton Barker | COMMUNITY DIRECTOR, PELAGIOS COMMONS

http://commons.pelagios.org/ | @Pelagiosproject

jtauber commented 6 years ago

So the English of Herodotus is marked up for people and places (e.g. https://lk353.eu1.eldarioncloud.com/reader/urn:cts:greekLit:tlg0016.tlg001.perseus-eng2:2.3) but all we're currently doing is styling the named entities differently. We can do something with them as the XML has identifying information: https://perseus-cts.eu1.eldarioncloud.com/api/cts?request=GetPassage&urn=urn:cts:greekLit:tlg0016.tlg001.perseus-eng2:2.3

But the other thing that can be done is individual words (or ranges of words) can be referred to for standoff annotation, e.g. https://lk353.eu1.eldarioncloud.com/reader/urn:cts:greekLit:tlg0016.tlg001.perseus-grc2:2.3?highlight=%40Ἡλιοπολῖται%5B1%5D

rsimon commented 6 years ago

Hi all,

just to separate two different scenarios here: one is about places and people - and here I‘d be interested not in marked up text from Perseus. But first of all in the dictionaries mentioned by @ChiaraPalladino above. These could then be used to mark up text in Recogito.

The other use case is about opening a Perseus text in Recogito. As Elton says, building this functionality should be within reach now, that the Recogito UI can handle TEI text content. However that‘s another issue: #437.

I guess once both these things work, we can think about next steps. E.g. navigating seamlessly between markup in Recogito and the Scaife viewer (i.e. @jtauber‘s links above) should be pretty straightforward to enable?

jtauber commented 6 years ago

Not just linking between the two but the pluggable widget architecture of Scaife means we could have a widget in Scaife that displays more information on named entities in the current passage without navigating away.

rsimon commented 6 years ago

Great! That might be an opportunity to do something with the Peripleo widget (as briefly discussed via twitter already) But, again, a slightly different matter. (Although thanks to Peripleo running reasonably stable now it’s something we could do, basically, immediately.)

rsimon commented 6 years ago

PS: coming back to authority (place/people) dictionaries: I‘m also linking in issue #455 for reference.

ChiaraPalladino commented 6 years ago

coming back to the dictionaries issue, I think it would be great to have the generalized API that refers back to entries in lexica and glossaries of places and people. For example, if we could dig into the dictionary of greek and roman geography http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.04.0064 , and assign urns that come from there to place references in Recogito.

Last year, we briefly talked about making these dictionaries CTS compatible (because a bunch of them still aren't). Once that is done, would it be a problem to use an API to connect them to Recogito? This is the way Bridget set up Plokamos, as far as I remembr.

rsimon commented 6 years ago

would it be a problem to use an API to connect them to Recogito?

Nope, that‘s exactly one of the things that went into the design of the new backend: attaching API-based authority lookups (rather than lookups into Recogito‘s own union index only). We‘d still need some trimmed-down API though I think. I.e. something that can take a search query (along the lines of what you can type into Recogito‘s gazetteer search box) and which returns results (along the lines of what‘s listed in Recogito‘s gazetteer result list).

2af83813-f752-4026-9c8c-9fc91a991756

ChiaraPalladino commented 6 years ago

Plokamos does that exactly: https://sites.tufts.edu/perseids/2016/12/16/announcing-plokamos-a-semantic-annotation-tool/ https://github.com/perseids-project/plokamos

I do not know at which stage the backend is, and if they are going to continue working on that (the developer who made it is not working in Boston anymore). But it is a good "inspiration".