open-editions / corpus-joyce-ulysses-tei

James Joyce's novel Ulysses in TEI XML. Work-in-progress.
20 stars 17 forks source link

Automatically identify and categorize place names #49

Open JonathanReeve opened 5 years ago

JonathanReeve commented 5 years ago
  1. Do some NER to find place names (SpaCy? Entity type: place) Here is some documentation from SpaCy
  2. Automatically look up that place name in DBPedia, using the DBPedia API and the Python requests library.
  3. Use that data to include latitude, longitude, description, etc. for each (real) place.
  4. Make a heuristic for categorizing place names.
JonathanReeve commented 5 years ago

@MatthewKumar, want to give this a shot?

JonathanReeve commented 5 years ago

This is very closely related to #27, and is partially complete in #41.

JonathanReeve commented 5 years ago

See also https://github.com/open-editions/corpus-joyce-portrait-TEI/issues/90. It's best to have place name metadata in a separate file, say, places.xml, and each place name in the text is referenced by an xml id. Here's some TEI documentation

JonathanReeve commented 5 years ago

We should also integrate @muziejus's Wandering Rocks data: https://github.com/muziejus/wandering-rocks/blob/master/data/instances.csv

workshub[bot] commented 2 years ago

A user started working on this issue via WorksHub.

workshub[bot] commented 2 years ago

@Freitas-Mp started working on this issue via WorksHub.

JonathanReeve commented 2 years ago

Hi @Freitas-Mp! Could you tell me what you had in mind for this issue? I may be able to help you think through it.

workshub[bot] commented 2 years ago

@baoduong started working on this issue via WorksHub.

workshub[bot] commented 2 years ago

A user started working on this issue via WorksHub.

workshub[bot] commented 2 years ago

@bragaji started working on this issue via WorksHub.