Open peterbussch opened 3 years ago
I think it is really awesome you guys are using find and replace to speed up your markup. Our group was trying to find something similar to make character tagging a little easier in our tales, but alas we couldn't find anything consistent that marked everything since characters are referred to by different names. We also began working on our website, primarily on planning, but still working on it nonetheless. How is your group planning on structuring your project site? Is there a site we've looked at in class that you're planning on using as an example? I highly recommend that last one, going and looking at older projects made our organization questions a lot easier to answer.
@jeepy33 Autotagging text that refers to persons who can be specified in different ways is challenging for exactly the reason you mention: different words may refer to the same person. Peter's strategy is very capable: find a regex that matches the ways we can refer to a person and use that to tag the names. The Leningrad situation (it contains the personal name "Lenin" as a substring) might be managed by tagging Leningrad first, e.g., <place what="city" name="Leningrad">Leningrad</place>
and then restricting the scope of subsequent replacements by using the XPath widget in the <p>
, which will implicitly mean that it won't be tagged if it's in a <place>
inside a <p>
, since then it would not be a child of <p>
.
find: молот[ЁёА-я]*
andreplace with: <person who="Molotov">\0</person>
И.стал[ЁёА-я]*
\sстали[ЁёА-я]*
(to find instances of Сталин when it's not written И.Сталин)бухар[ЁёА-я]*
троц[ЁёА-я]*
Дзержинск[ЁёА-я]*
зинов[ЁёА-я]*
рык[ЁёА-я]*
камен[ЁёА-я]*
Томск[ЁёА-я]*
Рудзу[ЁёА-я]*
Лозовск[ЁёА-я]*
Калини[ЁёА-я]*
Ленин.{1,4}\s
was a little different, because we have to account for Ленинград, which will be marked up separately with other place names.