Closed balmas closed 8 years ago
We need to find a way to meet these requirements that allow us to: 1) take advantage of existing tools both inside Perseids and out for the student workflow, with minimal to no additional programming 2) preserve the data the students create in a way that is not specific to any tool 3) aggregates data and produce a nice publication with the Smith dictionary as its center piece.
I think we might be able to do something here by combining the Hypothes.is annotation bookmarklet tool and its API features with the Perseids google spreadsheet ingest. This would require a transformation of the Hypothes.is API output, but the cite annotation module in Perseids is already set up to be easily extended with other models.
If we define rules for the students along the lines of what we did for the timelines, I think we could end up with data that reusable in the way we want. If the PundIt tool was ready, this might be a better alternative than Hypothes.is because it's designed to support semantic annotations more natively I think, but it's not out yet and I think we need to work with what is ready. This is a prototype workflow anyway and we could then look at applying it more broadly to other tools later.
So, what I'm envisioning is a something like this
1) Student installs the Hypothes.is bookmarklet in their browser 2) Student navigates the section they want to annotate in Smith's in the existing Perseus 4 environment (e.g. http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.04.0104:alphabetic+letter=D:entry+group=11:entry=diomedes-bio-1) 3) Student toggles the Hypothes.is bookmarklet on 4) Student highlights the word or words they want to annotate 5) For a reference to a text, e.g in perseus, they navigate in another window to that text in Perseus and copy the stable uri then putting that as the annotation text in Hypothes.is 6) For a relationship to the topic of the Smith article, we give them a controlled vocabulary to use 7) For a link to an image, they supply the image url, etc. 8) When the student is ready for their work to be reviewed, they go to Hypothesis, make their annotations public, and copy the urls of their annotations into a spreadsheet, and then submits that spreadsheet to Perseids the same way they do for the timelines [ Steps 2-8 are shown in this screencast http://youtu.be/FlgwFJN1Ilc ] 9) The Perseids ingest process retrieves the annotation data from the Hypothes.is API (for sample output see https://hypothes.is/api/annotations/W-tSZzBLSmCDBBbwUkvmjg) and converts it to the OA format that we use, doing some data cleanup on the urls in the process, applying the SNAP ontology etc. 10) each annotation we save contains in it the link to the original annotation, so that Marie-Claire can see the annotation in context (again as we did for the timelines). (Replying inline in Hypothes.is is possible, but then all replies are public so I don't know if we want to use this for review) [ Step 10 is roughy shown here http://youtu.be/9BXVygbVnWU ] 11) Marie-Claire accepts the annotations, and then we have a set of annotations targeting the Smith's dictionary 12) We use GapVis, with the extensions that were done for Hellspont to publish and visualize the students work.
We will have programming work at steps 9 and 12.
Step 9 I think is only a day or two of work for me at most. I think the trickiest part about this will be the fact the text selectors Hypothes.is gives us are based on the HTML display of Smiths provided by the P4 hopper and we will have to think a bit about how we want to deal with that, particularly in the long run if we want to merge any of this back in the source TEI for the Smith's.
Step 12 will require transformation of our data to a format compatible with GapVis and likely some enhancements to the GapVis code. The Hellespont work is not yet merged into the mainline of GapVis development and probably won't be as they are currently coded. Initial discussions with the GapVis team were promising -- they may be able to commit some programming resources to this if we can define clear requirements for them.
Obviously this is a compromise, and it would be nice to have a seamless workflow that allowed the users to select from an existing ontology, navigate the resources which serve as the contents of the annotation, etc. all in a single environment. But we can't do that right now, and there isn't really any reason we should reinvent what other people are doing.
We have begun gather samples of the types of data we'd like to bring into GapVis for display at https://github.com/PerseusDL/GapVis-eids
As Hypothesis does not have formal support for semantics, we need to provide the students with manual instructions on tags and uris to use. The page at https://github.com/PerseusDL/perseids_docs/wiki/Data-Guidelines-for-Hypothesis-Annotations will be used to gather this information.
A workflow question for @Marie-ClaireBeaulieu - an alternative to the google spreadsheet submission workflow step would be to allow the students to submit their individual annotations directly to Perseids. This has both advantages and disadvantages. With the google spreadsheet workflow they have to make their annotations public, copy the annotation urls to a google spreadsheet, make their google spreadsheet sharable, and then copy and submit their google spreadsheet url. The advantage with the alternate approach of ubmitting the hypothes.is annotation urls directly is that it would involve less steps for the students --- i.e. they would just make their annotation urls public, and then submit those directly to Perseids. The disadvantage is that it is many more individual submissions for the reviewers to review, and the annotations themselves would not be grouped in any meaningful way in the review board. E.g. if the student projects are to provide an a meaningful set of annotations on an individual hero, we lose the concept of the set in the submission, and would have to recreate it from the target of the annotations.
Personally, I think there is value in supporting both, particularly in the long wrong, but lean slightly in favor of sticking with the google spreadsheet approach for this semester.
Another request for @Marie-ClaireBeaulieu : we need a complete list of the types of annotations you want the students to make, and particularly for relationships between people, a list of relationships you would have them identify, so that we can come up with the controlled vocabulary for the tags.
Another question for @Marie-ClaireBeaulieu : is there a set of resources other than Perseus that you want the students to use for their annotations? E.g. Wikipedia, a particular site with artwork and images, etc. ?
answers for @caesarfeta and @balmas
-I agree that we should support the direct submission of annotations and the spreadsheet approach. However, I think the spreadsheet is the way to go for the class, the vortex, and sunoikisis, as the point of those workflows is to keep the annotations grouped (thematically or otherwise) and the spreadsheet does that.
@balmas list of types: will get you this next Thursday. Noted in my to-do list. Do you need this just for the class, or should I also be thinking about the vortex and sunoikisis?
Perseus will be the primary source, but we can expect other stuff to come from the MFA, the Met Museum, theoi.com, and various random museum sites. It is difficult to list them since it depends on where each student takes their work (research questions they will design themselves)
@Marie-ClaireBeaulieu I think we need the annotation types for your class as first priority, but we will need this for the vortex and sunoikisis too!
@caesarfeta I have added a sample of what a JOTH data file MIGHT look like to the GapVis-Eids repo at https://github.com/PerseusDL/GapVis-eids/tree/master/samples/data/perseids/joth. It's important to be aware that the exact details in terms of ontologies used etc. may still change, but whatever solution we develop for transforming this data for ingest into GapVis should ideally be flexible enough to deal with that. I am hoping that what's in the sample file, along with the timeline sample data, should be enough to get you started on experimenting with the implications of making this data available in GapVis.
@Marie-ClaireBeaulieu @balmas Here's a list of relationships I extracted from the SNAP ontology. Should help in building our restricted keyword vocabulary https://github.com/PerseusDL/GapVis-eids/blob/master/samples/data/min/src/rel.txt
@caesarfeta @balmas Looks good to me! Do you want me to prune it? It looks to me like all of them could be useful at some point, but if it's just for the class I can take some out
@balmas @caesarfeta : Here's the list with pruning. I would add some, if that's possible:
@balmas @caesarfeta
Here's the beginning of a list for places:
Trying to keep this brief, but there might be more
A source for pleaides lat/long data : https://github.com/ryanfb/pleiades-geojson (feeds https://ryanfb.github.io/pleiades-static-search/)
Per discussion 15-Jan, we will not annotate features of places (city, country, etc.) but instead draw that from the pleiades data.
Write up of requirements for the dissemination is started here: https://github.com/PerseusDL/perseids_docs/wiki/Dissemination-Stories
NB that all we're doing right now is the Journey of the Hero workflow. I've started listing some user stories for the longer term vision but these need to be further refined and broken down into detailed requirements before attempting to implement them. They are included for context only right now.
@Marie-ClaireBeaulieu @caesarfeta In looking at https://github.com/PerseusDL/GapVis-eids/blob/master/docs/joth_annotate.md I don't see anything about linking the text with image and artifacts. Are we dropping that?
@caesarfeta the template for the students to enter their hypothesis annotation links for import into Perseids is here: https://docs.google.com/spreadsheet/ccc?key=0AsEF52NLjohvdGo4dDU3RnR6TmZzbTF1aFpWcFY3bmc&usp=sharing . Instructions for using it here: https://sites.tufts.edu/perseids/instructions/uploading-googlespreadsheet-data-as-annotations/ . These should be added to the student instructions for JOTH. They should be instructed to copy the abbreviation for their hero from the search box on the Perseus Hopper display (e.g. diomedes-bio-1) and use that as the contents of the first column of the spreadsheet. Note that the instructions for using this abbreviation for their also still need to be added to the instructions.
@Marie-ClaireBeaulieu @balmas
What do we do about people's names that are used as chronological milestones? Phrases like this, "in the time of Plutarch", are common. Should we have a "Milestone" relationship keyword?
See...
@Marie-ClaireBeaulieu @balmas
Should there be a "Self" keyword to annotate alternate spellings or nicknames? Or is the same id sufficient?
@caesarfeta my preference would be to keep things simple and leave both of these (name as time milestone and tagging of alternate spellings) out for now. @Marie-ClaireBeaulieu do you agree?
@balmas @caesarfeta as for linking the text with images and artifacts, I think we should keep it but for the second assignment . They can always use the urns in the Perseus image browser and stable urls where they exist
@caesarfeta I like the "milestone" idea. This will be the purpose of the Times and Places assignment, so yes. Students will be documenting the chronological and geographical spread of documents that talk about their hero or heroine
@balmas @caesarfeta for the "self" keyword, I agree with Bridget: I think it's somewhat confusing, so let's keep it simple
@caesarfeta @Marie-ClaireBeaulieu I am not sure about using "milestone" though to tag a phrase as a date. maybe we would do this by assigning two tags: "date" and "period". In the conversion, this will be represented as on oa:Tag, the same way we do actual dates, but with a type of "dcterms:PeriodOfTime" rather than "dc:date"
Adam, when we add the instructions for tagging dates to the JOTH instructions, please refer to https://github.com/PerseusDL/perseids_docs/wiki/Data-Guidelines-for-Hypothesis-Annotations#dates. Maybe we could ask the students to make the contents of their annotation a date string that adheres to that syntax, and in the case of strings like "in the time of Plutarch" either ask them to supply a real date, or else to just use a "period" tag. I'm not sure what will work best here.
@caesarfeta please make the following changes to the documentation:
@caesarfeta @Marie-ClaireBeaulieu I have removed Self and Milestone from the documentation for now. For Self we agree we aren't doing this. For Milestone, this should not be tagged with the relationships but instead with dates, and we need to include instructions per the previous comment on this.
@balmas @Marie-ClaireBeaulieu Could I get a screenshot of a hypothes.is date annotation? It's still ambiguous and a screen-shot would clarify the formatting for me.
@caesarfeta @Marie-ClaireBeaulieu explained to me that what she wants is for the students to annotate the sources referenced in the Smith text with dates and places. So e.g. in the attached screenshot, Apollod. 1.8.5 is highlighted and annotated with the date -200 BCE. And then it would be highlighted again and annotated with the Pleaides place uri for Athens. This is going to be a little tricky for the students, particularly for the bibliographic references which are already linked to Perseus texts, because the hypothes.is widget seems a little finicky about popping up on links because clicking activates the link... I was able to get it to work by selecting from the end of the linked text and dragging to the beginning.
Note that for the bibliographic references which aren't already linked, we would want the students to add a 3rd annotation, the link to the Perseus stable URI for the citation. See the attached.
@caesarfeta Clarifying a point about the back-end: The final annotations will NOT be served in real-time from Perseids SoSOL in this phase of the project. We will export the finalized annotations from the git repository and this is what will be made available to the GapVis UI. We have a number of options for dealing with this. Note that I would like the UI to display the stable URI identifier for the annotation, in the same way we do for the student commentaries in Perseus (served via the Perseus-LD widget). The raw annotations WILL also be deployed at that address (i.e. http://data.perseus.org/collections/urn:cite:perseus:pldjann.xxxxxx) if it makes sense, then we could serve them directly from there to GapVis. Perhaps we could use this to make a start at a RESTful API response for CITE, and implement a simple listing of links to the available annotations at http://data.perseus.org/collections/urn:cite:perseus:pldjann. If this is not possible right now, then I believe we could probably also serve them locally from the filesystem on the server on which the GapVis UI is deployed as a short term solution, but I would still want to reference the stable URI identifiers and link to them as described above. I can think of a variety of other options that we could discuss if neither of these works.
Done ?
yes, except for full documentation of the final solution. would like to leave this open until then but will reassign it to myself.
Original requirements from Marie-Claire:
-Use Smith as an anchor point to produce data on mythological/historical figures. -That data would consist in: texts and artwork describing/talking about these figures chronological and geographical info about these figures relationships between the figures (when applicable) -In concrete terms, what would the students be doing: -As we discussed, we should take a copy of Smith and work in a separate place (we can always link back to Perseus anyways) -Use the data already in Smith to collect texts (some of the texts cited in Smith are already linked to the primary texts in Perseus, others not, so we should have students working on regularizing that) -Add texts that might be absent from Smith -add links to artwork depicting these figures -organize the texts and artwork in timelines/timemaps using the workflow we had with timemapper. We can then append the timelines directly to the entry in Smith as a visualization, but we can also think of other ways to expose that data and make it available for other users who may want to conduct other types of analysis (e.g. if someone wanted to mine the Smith data, although it will be a long time before there's nearly enough to do that) -for the relationships, I would like to be able to define relationships among the figures present in Smith (such as "son of", "enemy of", "wife of", etc) and back it up with direct links to the primary sources in Perseus or other primary evidence such as artwork. We can offer visualizations of that once we choose a toolset and method.
Just as in my regular myth class the term paper will be centered around an object in the MFA depicting a figure present in Smith. In this case, I will include artwork from later periods (Medieval, Renaissance, etc), so the datasets created by the students will cover a broader range than the myth ones. These experiments overlap completely with what's intended for the prosopography work in Visible Worlds, so if it all works well we should be all set for that. I told Monica (and I talked about with Michele back in September) so it's all OK for them.
Hope this helps in giving you a sense of where I'm heading with this.