inukshuk / citeproc-ruby

A Citation Style Language (CSL) Cite Processor
103 stars 23 forks source link

Variables used only in conditionals inside of names->substitions are suppressed #41

Closed lightman76 closed 8 years ago

lightman76 commented 8 years ago

Found this issue in the modern-language-association-8th-edition.csl

In the MLA8 style, if I don't provide an author, but have the title and container-title, then the container-title is not being put into the citation.

In the MLA8 csl the author names macro includes several substitutions, the last one calling to the "title" macro. The "title" macro in turn decides whether to use quotes around the "title" variable or to italicize it by checking if the "container-title" variable is empty. The use of "container-title" variable in this conditional in the substitution is causing it to be marked as suppressed and so it is not output later on. I believe this should not be the case. I've tried the same situation on citeproc-js and the container-title displays as I expected.

Here's a gist with a failing spec https://gist.github.com/lightman76/d61ef640e9caa269bfe0ef0fe829c6e43

I'm pretty new to the csl/citeproc world, so trying to get my head around all of it. I'll try to provide a pull request. Please let me know though if you think I've misunderstood the cause of the problem I'm seeing with MLA8.

lightman76 commented 8 years ago

Ok - Actually I've dug a little further into this, and now I believe that the problem is in citeproc at https://github.com/inukshuk/citeproc/blob/master/lib/citeproc/attributes.rb#L29

The attribute? method uses the observable read_attribute. I would suspect that attribute? is probably only used in conditional testing and as such shouldn't need trigger the observers.

Working on some specs around this but here's the modification I'm thinking of to the attribute? method in attributes.rb - https://gist.github.com/lightman76/53832347f145cfdc65394c617b40ea6e

All the existing specs are still passing after the change. Can you think of any issues this might cause?

inukshuk commented 8 years ago

Good work! I probably did not realize that there is a need to distinguish between different types of variable access. Defining attribute? as an unobserved read in this context sounds like a good call to me -- I'd be happy to merge a PR for this.

/cc @rmzelle @adamsmith @fbennet perhaps this is something where the spec could be more explicit? I.e., define exactly what constitutes as a variable being used for a substitution?

rmzelle commented 8 years ago

I'm pretty sure the idea is that variables are only suppressed if they already have been printed through a substitution. If the <choose/> fails to return any output, there is no successful substitution, and the tested variable(s) shouldn't be suppressed.

http://docs.citationstyles.org/en/stable/specification.html#substitute says "the first element to return a non-empty result is used for substitution. Substituted variables are suppressed in the rest of the output to prevent duplication."

That's already pretty clear, no?

inukshuk commented 8 years ago

No, it is not clear to me at all.

My interpretation of this was that, given the first non-empty result, all variables that led to this result being printed will be suppressed. Your comparison to a <choose/> that fails to return output misses the point: the issue is that it does produce output! If it fails to produce output, the variable would not have been suppressed. But it produces a string; in order to produce that string it accesses two variables (container-title and title) -- my interpretation was that, therefore, both variables count as 'substituted variables' and should be suppressed (both variables were used to produce that string).

In the case at hand, it is very obvious that only one of the two variables was actually 'printed', so my interpretation obviously differs from the MLA style authors and also from citeproc-js. But the point is, if not all variables which are used/accessed to produce the substitution string will be suppressed, we need criteria to decide what sort of variable use counts and what does not and these criteria should be part of the spec, because this is not answered by the language in the spec in my opinion. If the answer is 'only variables which are used in a <text/> shall be suppressed' then the spec should include that statement, no?

What if the macro in question looked like this:

<macro name="title-short">
  <choose>
    <if variable="A">
      <text term="B"/>
    </if>
  </choose>
</macro>

If there is no variable A then nothing is printed and consequently, nothing will be suppressed. If there is a variable A a substitution will be printed. Should A be suppressed now? Its presence is the determining factor for the existence of the substitution string, after all.

For another example, what about dates or numbers? If the variable in question is a date or a number it may be rendered into a string that looks very different from its original representation -- but you would still say that this variable was in fact 'printed'.

Having said that, I think I do understand, intuitively, what the spec seems to imply and why you say that it should be pretty clear -- but that's not enough for me to formulate a general rule for what variable use to count and what variable use to disregard.

rmzelle commented 8 years ago

But the point is, if not all variables which are used/accessed to produce the substitution string will be suppressed, we need criteria to decide what sort of variable use counts and what does not and these criteria should be part of the spec, because this is not answered by the language in the spec in my opinion. If the answer is 'only variables which are used in a shall be suppressed' then the spec should include that statement, no?

Yeah, I guess it could be clearer. @adam3smith, @fbennett, correct me if I'm wrong, but I assume that any string resulting from <text/>, <names/>, <number/>, or <date/> would suppress the involved variables. Variables merely tested through <choose/> wouldn't count, and probably neither those called through <label/> (since <label/> doesn't use the variable content directly), I think.

fbennett commented 8 years ago

Yeah, I've always just casually thought of "rendering from within cs:substitute," as the trigger, but there are assumptions embedded in that. @rintze's list (rendering nodes that mangle variable content to produce their output) captures the citeproc-js behavior.

inukshuk commented 8 years ago

Looking at the choose renderer in citeproc-ruby, I realized I was already using unobservable_read to access variables during evaluation of conditions -- there was just this one access that I missed which caused the bug in question. So it looks like I actually aligned my interpretation with the intuitive one while implementing it but forgot about it! (Not through <label/> though, that's a good point that I'll have to fix as well)

In any case, I still think that it would be helpful if the language in the spec were more explicit in this case.