korpling / ANNIS

ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation.
http://corpus-tools.org/annis/
Apache License 2.0
69 stars 25 forks source link

HTML visualization #80

Closed amir-zeldes closed 11 years ago

amir-zeldes commented 11 years ago

There should be a whole document visualization (not just the hit) taking one text (in case of parallel corpora), which produces the running text of the document from some segmentation level (tokens or otherwise), configured via mappings. It should be possible to format the document based on a mapping between annotation levels and HTML elements or CSS styles. For example, a

element could be introduced whenever a 'p' annotation name or value is present. This requires a simple .ini file to configure the visualization. The file should have two columns: a condition triggering the formatting, and the formatting to apply. The condition can have the form

annoname

or

annoname="value"

Where matching the second condition supercedes the first, as it is more specific. The formatting column should have one of the forms:

elementname

or

elementname;style="attr1: val1; attr2: val2"

Some examples:

p     p
pb   p
lang="foreign"     i
lang="foreign"     span;style="font-style: italic; background-color: red"
amir-zeldes commented 11 years ago

A more detailed suggestion:

The style sheet should have three columns specifying the triggering condition, what element/styling to produce and what text to produce within that element.

The triggering condition can be the presence of an annotation name, value or name+value combination.

As text within the element, it should be possible to choose the annotation name, value, namespace, or an arbitrary string determined in the style sheet.

Rules apply in the order in which they appear in the style sheet. The element specified is produced at the beginning of the annotation span (left token) and closed at the end (right token).

If several rules apply to the same annotation or multiple annotations with the same token area, the first rule is applied to produce its opening tag, then the second rule produces its opening tag etc. When multiple closing tags are applied, they are generated in reverse order.

Some examples:

title              b                            value        #prints the title in bold
chapter            p                                         #surrounds each chapter with a p
chapter            i                            "Kapitel: "  #prints the text "Kapitel: " in italics at start of chapter
chapter            i                            value        #prints the value of the chapter annotation in italics
pb                 span; style="color: grey"    "page "      #prints a grey span with the text "page "
pb_n               span; style="color: grey"    value        #prints a grey span with the value of pb_n
pb                 table                                     #surrounds each pb annotation area with a table element
pb                 tr                                        #surrounds each pb annotation area with a tr element, within the table element declared above
column             td                                        #surrounds each column annotation area a td element
column             span; style="colStyle"                    #applies 'colStyle' to anything inside a column annotation
lb                 p                                         #surrounds each lb annotation area with a p element
lb_rend="italics"  i                                         #applies italic rendering to anything within an annotation lb_rend="italics"
="Gott"            span; style="color: red"     value        #prints and highlights in red the value of any annotation with the value "Gott"
speaker            span; style="color: grey"    anno         #prints the annotation name whenever an annotation 'speaker' occurs
tok                span; style="tokStyle"                    #prints each token
amir-zeldes commented 11 years ago

Using metadata in the visualization: a further expansion requested by the Coptologists is the ability to integrate some of the metadata into the document view (selectively and without clicking on the 'i' button). If we allow the HTML visualization to access metadata as well we can answer this request at the same time. The inspiration for this visualization is something like papyri.info, for example here:

http://papyri.info/ddbdp/p.eirene;1;3?rows=2&start=0&fl=id%2Ctitle&fq=series_led_path%3Ap.eirene%3B1%3B*%3B*&sort=series+asc%2Cvolume+asc%2Citem+asc&p=0&t=32

Notice the metadata at the top and the entire text of the document at the bottom.

amir-zeldes commented 11 years ago

Yet another suggestion: we should allow regex based matching to trigger HTML generation, not just exact match. For example, to correctly display TEI rend attributes written in plain English, if the rend description contains the word "red", we make the font red:

rend=/(^|.* )[Rr]ed( |$).*/ span; style="color: red"

This is triggered by an annotation like:

rend="large illuminated red capital"

thomaskrause commented 11 years ago

An experimental version of the HTML visualization was added in revision a828238b8bec098d4b00be293364d7fc5a7031ee