zme1 / toscana

A repository to house research and web development for the Lega Toscana project, led by professor Lina Insana (Spring 2018) and professor Lorraine Denman (Fall 2018), and with consultation from members of the DH Advanced Praxis group at the University of Pittsburgh at Greensburg.
http://toscana.newtfire.org
3 stars 1 forks source link

Processing collections in XSLT #13

Closed zme1 closed 6 years ago

zme1 commented 6 years ago

I just pushed my first XSLT stylesheet for drafting the tab-separated files for network processing. I spent about an hour and a half trying to successfully process the collection of TEI files for the network, but I could not run a successful transformation. For the time being, I decided to simply develop the transformation itself by using a single input file (just so I could format my output file), but I still can't access the full collection all at once.

I tried matching my $minutesColl variable to an <xsl:template>, matching the $minutesColl//teiCorpus, matching the document node and applying templates that matched my variable within it, and many others... I suppose I am uncertain how to call on a collection variable specifically for outputting a .tsv file, since it ostensibly differs from outputting to an HTML file. Do I match to the variable and descendant teiCorpus (or similar descendant node) where I would match the document node in my XSLT file? Do I literally match the document node as I normally would, then create separate templates that refer to my global collection variable? Any help is greatly appreciated.

ebeshero commented 6 years ago

@zme1 Well, you can do this with either XSLT or XQuery, but I think I'd recommend XQuery for more concise code. This past week I've been getting people running with their first output TSV files from XQuery and since this is a lot of pull processing and the code blocks are a little simpler than template matches, I guess that's the technology I'd recommend...

I can show you, though, how to output text instead of HTML from XSLT, and you can decide for yourself which technology to use. You can write XQuery, by the way, in oXygen if loading stuff in the eXist-db isn't appealing.

zme1 commented 6 years ago

@ebeshero I haven't worked with XQuery since last spring, as a student. Do you think it'd be worth my time to re-learn it for this type of data export?

ebeshero commented 6 years ago

@zme1 Now that I look at the XSLT, it's not all that complicated--and I think you might just need some help with that collection variable...I'm just pulling it in to take a look!

In the end, the two technologies might be equivalently complicated if you need to deal with nested for-loops. I'm curious why you don't need to do this: Is there one and only one supporter for every proposal? (I suppose this is the nature of the directed network...but if you have more than one, you'll want to loop through them and create a new line for each one.)

zme1 commented 6 years ago

@ebeshero I was trying to use an xsl:for loop, but I just had trouble determining how to nest it into my stylesheet, since it can't be a child element of the root and I didn't know if or how to match to the "document" node in my template..

ebeshero commented 6 years ago

@zme1 Well! The good news is, I figured out the problem you were having with accessing your collection() function! It's not anything wrong with your code at all, but it's due to those pesky DS_Store files our Macs like to generate. When XSLT finds one in the collection, it rightly says, "bleccch, that's not an XML document!" b/c it can't find the nodes it's looking for.

Here's how to solve that problem and get output! https://github.com/ebeshero/DHClass-Hub/wiki/Banishing-DS_Store,-Thumbs.db,-and-other-pesky-files-we-don't-need

ebeshero commented 6 years ago

@zme1 I think I never sent you a proper .gitignore file, did I? (Sorry!) I'll push one to your repo. The .gitignore will tell GitHub not to accept DS_Store files in git pushes...

ebeshero commented 6 years ago

Banished DS_Store and its friends with this commit: https://github.com/zme1/toscana/commit/f5e10537f3abf70ad900fab5e3beb9eeac47ada1

Now, on to more important things like the node - edge - node relationships!

ebeshero commented 6 years ago

So, @zme1 your matching on the document node is necessary and works just fine. What you do is, run the XSLT over a "dummy" XML file (any one), and on match of a single root node, you go and reach over to the collection and do nothing at all with that "input" file. It's kind of a game you play with oXygen. You can do this differently at command line without having to select a dummy input XML file at all--but oXygen is good to work with as you're figuring out the bugs and what you want to capture. I usually have one file in the collection open as I'm writing the XSLT so I can see what I want to process.

So, the upshot is, your code is good, and you should now be getting output: Is it the output you want? My further question is, what's the relationship you expect to see between proposer and supporter? Is it one-to-one? Or one-to-multiple?

zme1 commented 6 years ago

@ebeshero Yes, the output looks perfect! (sorry I took so long to respond; I started trying to put an index page together, and was met with some odd frustrations in my style properties). I'll go to Cytoscape and see how the output looks at a first glance!

It looks like I also did not have my XPath expressions formatted entirely correctly -- although they may have been closer at some point while I was trying to debug... So did those DS_Store files disrupt the entire transformation just by virtue of their presence in the directory?

zme1 commented 6 years ago

@ebeshero And to answer your question, my proposal interactions are almost always a one-to-one. I think there may have been a few times there were multiple supporters or proposers, so I should probably write in code to account for those instances.

zme1 commented 6 years ago

@ebeshero After looking through my corpus, there are one or two instances in which there are multiple supporters, but there is always one proposer (at least for the specific proposals going into my network). I rewrote my XSLT to do two new things: 1. To find the participants at any level in the interaction, because not all of them are on the child axis of seg, and 2. To account for instances of two supporters to one proposal. What do you think? I don't know if it's the most concise way to do it, but it seems as though it's working.