Open escowles opened 7 years ago
Some options:
# 4 above seems like a fair amount of work, but probably the best for having a sustainable application, given how few of us are comfortable working with XSL. But any of the above seem like reasonable improvements on our current approach, which can result in broken HTML pages.
XSLT isn't terribly difficult to work with, actually. I'd be very leery of re-inventing an XML parser simply to avoid it.
If you could describe the static-page-generation problems a bit more (or point me to descriptions of them), I might be able to help.
@cwulfman I'm not saying XSLT is hard, or otherwise a bad choice here. Quite the contrary, it's an obvious choice for XML-to-HTML conversions like this. But since very few of our devs know XSL and basically all of them know Ruby, I think it's worth considering whether it would be better to enhance the existing Solr indexing code to be able to generate the pages from Solr instead of using the static HTML pages. After all, we are already doing the redundant work of parsing the TEI and indexing it in Solr, it would be very little extra overhead to have that completely replace the XSLT.
Some of the issues with the static pages are discussed in #337 — it is mostly that it makes deploy take a much longer time, and that the application often gets into a state where the static pages are on disk, but can't be found for some reason, making virtually every page error.
@escowles I jumped into the conversation without discovering the background first; my apologies. Way back when, I think the idea was to create an interactive edition of the Catalogo with the TEI file(s) as a source (common practice) and then integrate the item browser/viewer with it.
XPath and XSLT/XQuery are also good at dealing with complicated data structures, of which the Catalogo is certainly an example. If you were to abandon those tools and then use Ruby to extract data fields from marked-up text, I'd worry that you'd either end up in regex hell (been there, done that) or with a one-time conversion and data fork. At that point, you've probably left the encoded Catalogo behind. Maybe that's a good thing for this application?
@cwulfman I think we're talking past each other here a bit. I'm not suggesting anything so radical as not using XPath and XML parsers. I definitely think it's fine to keep using Nokogiri and XPath in https://github.com/pulibrary/cicognara-rails/blob/master/lib/cicognara/tei_indexer.rb — and if we move the XSL fuctionality over to that class, using more XPath would be reasonable.
That sounds reasonable to me!
Generating static pages from the TEI is time consuming and brittle — consider modeling them in AR or otherwise avoid static pages