Open sydb opened 1 month ago
Hi @sydb — the test config file (e.g. configTest.xml
) defines <cite>
as its own context and so it is excluded from the flow of surrounding contexts (and should yield exactly what you get).
Does your configuration file also have //context[@match='cite']
(e.g. line 42 of the configTest file)?
If so, does removing that resolve the issue?
Thank you for the prompt reply, @joeytakeda.
So (of course) you are right, that line 42 of configTest.xml causes the <cite>
behavior above, and commenting it out “fixes” it.
But what is mysterious (to me, at least) is that the config.xml I was using when I first encountered this does not have any <ss:context>
that matches <html:cite>
, at least not directly. (The only line that matches the string “cite” is <context label="works cited" match="div[ @id eq 'worksCited']"/>
.) I will have to poke around a bit to see if there is any other context my cite elements might be matching. Static Search only reads the one config file, right? (It doesn’t also read configTest.xml or something, does it? Are there any built-in default contexts?)
@sydb staticSearch only uses one config file, so if it's behaving as though it's reading the test config instead of your config, then it must be doing that, for some reason.
@sydb A bit more info on this:
https://endings.uvic.ca/staticSearch/docs/howDoIUseIt.html#specifyingContexts
There are default contexts built into the indexing process, based on the most common HTML block elements, which are listed there. But it's not actually complete, so I'm going to update it; the complete list, as it appears on xsl/tokenize.xsl right now, is this:
body | div | blockquote | p | li | section | article | nav | h1 | h2 | h3 | h4 | h5 | h6 | td | details | summary | table/caption
But it definitely doesn't include cite.
[This may be a bug. At least, I do not think it is the result of a mistake I have made, but I have been wrong about that before. :-]
The content of
<html:cite>
is dropped from the context created for each search term.To reproduce:
ant
.fgrep -h 'situational' test/ssTest/stems/*
(or otherwise look at the results), and you will notice that the word “citation” does not occur in the output "context": field (it should be in that space before the comma).cat test/ssTest/stems/citat*
, and notice that the word “citation” has no context around it.Appendix — cite_me_not.html