medialab / reanalyse

django platform to explore TEI verbatims, documents & speakers within structured qualitative studies
Other
5 stars 9 forks source link

Parsing / indexing / display TEI, XSLT #52

Open pierrejdlf opened 11 years ago

pierrejdlf commented 11 years ago

Parsing

the current parsing loops within all xml tags of TEI verbatim to store in database:

It's really heavy to compute !

idea: don't store parts of text in the db, rather only use TEI XML file

Solr Indexing

Solr is indexing:

idea: directly index content of TEI documents using solr XSLT PROBLEM ! having to do it all manually, without any help from haystack (update index, ...)

Solr Search

Simple Display

we want to display parts of a document from 'start' to 'end' (using StreamTimelineViz)

fetch array of successive styled sentences

for s in texte.sentence_set.filter(i__range=[start,end]).order_by('i','speaker','n'):

loop in the template

additional styling is made in the django template

idea: XSLT with arguments start/end to directly produce html from TEI XML file