pulibrary / cicognara-rails

Rails app for managing metadata for the Digital Cicogara Library https://cicognara.org
Other
1 stars 2 forks source link

Reconsider static page generation #236

Open escowles opened 7 years ago

escowles commented 7 years ago

Generating static pages from the TEI is time consuming and brittle — consider modeling them in AR or otherwise avoid static pages

escowles commented 5 years ago

Some options:

  1. Fix deployment issues with current system
  2. Checkin TEI, MARCXML, and generated static HTML pages, so updating data is more tightly integrated into the code and taken out of deployment/indexing
  3. Keep XSLT, but store output HTML in a field in the database
  4. Replace XSLT with Ruby code to parse the values into fields, and then generate the displays with normal views
escowles commented 5 years ago

# 4 above seems like a fair amount of work, but probably the best for having a sustainable application, given how few of us are comfortable working with XSL. But any of the above seem like reasonable improvements on our current approach, which can result in broken HTML pages.

cwulfman commented 5 years ago

XSLT isn't terribly difficult to work with, actually. I'd be very leery of re-inventing an XML parser simply to avoid it.

If you could describe the static-page-generation problems a bit more (or point me to descriptions of them), I might be able to help.

escowles commented 5 years ago

@cwulfman I'm not saying XSLT is hard, or otherwise a bad choice here. Quite the contrary, it's an obvious choice for XML-to-HTML conversions like this. But since very few of our devs know XSL and basically all of them know Ruby, I think it's worth considering whether it would be better to enhance the existing Solr indexing code to be able to generate the pages from Solr instead of using the static HTML pages. After all, we are already doing the redundant work of parsing the TEI and indexing it in Solr, it would be very little extra overhead to have that completely replace the XSLT.

Some of the issues with the static pages are discussed in #337 — it is mostly that it makes deploy take a much longer time, and that the application often gets into a state where the static pages are on disk, but can't be found for some reason, making virtually every page error.

cwulfman commented 5 years ago

@escowles I jumped into the conversation without discovering the background first; my apologies. Way back when, I think the idea was to create an interactive edition of the Catalogo with the TEI file(s) as a source (common practice) and then integrate the item browser/viewer with it.

XPath and XSLT/XQuery are also good at dealing with complicated data structures, of which the Catalogo is certainly an example. If you were to abandon those tools and then use Ruby to extract data fields from marked-up text, I'd worry that you'd either end up in regex hell (been there, done that) or with a one-time conversion and data fork. At that point, you've probably left the encoded Catalogo behind. Maybe that's a good thing for this application?

escowles commented 5 years ago

@cwulfman I think we're talking past each other here a bit. I'm not suggesting anything so radical as not using XPath and XML parsers. I definitely think it's fine to keep using Nokogiri and XPath in https://github.com/pulibrary/cicognara-rails/blob/master/lib/cicognara/tei_indexer.rb — and if we move the XSL fuctionality over to that class, using more XPath would be reasonable.

cwulfman commented 5 years ago

That sounds reasonable to me!