inukshuk / jekyll-scholar

jekyll extensions for the blogging scholar
MIT License
1.12k stars 102 forks source link

Support for jekyll with non-utf8 encodings #290

Closed LinuxMercedes closed 4 years ago

LinuxMercedes commented 4 years ago

I have a project that uses jekyll-scholar; however, due to circumstances beyond my control I can't use utf-8 as the output encoding. With encoding: ISO-8859-1 in my _config.yml, I get Liquid error: internal where {{ reference }} appears in my template except for one @book entry. I can reproduce this with style: ieee.

With style: apa, several more entries appear but I have yet to discern a reason why others fail.

Do you have any advice on troubleshooting this issue?

inukshuk commented 4 years ago

Interesting. Can you share one of the entries that fails with the apa style?

LinuxMercedes commented 4 years ago

Works:

@inproceedings{JaS19b,
  address = {San Antonio, TX, USA},
  author = {Jarus, Natasha and Sedigh Sarvestani, Sahra and Hurson, A. R.},
  month = nov,
  year = 2019,
  title = {Towards Refinement and Generalization of Reliability Models Based on Component States},
  booktitle = {Proc. Resilience Week 2019},
  groups = {cpci},
  url = {https://arxiv.org/abs/1910.04027},
  status = {To appear.}
}

Fails:

@inproceedings{JaS19a,
  address = {Hangzhou, China},
  author = {Jarus, Natasha and Sedigh Sarvestani, Sahra and Hurson, A. R.},
  month = jan,
  year = {2019},
  title = {Formalizing Cyber--Physical Model Transformation via Abstract Interpretation},
  booktitle = {Proc. 19th IEEE Int'l. High Assurance Systems Engineering Symp. (HASE)},
  groups = {cpci},
  doi = {10.1109/HASE.2019.00025}
}

Also, I just realized that this whole website is up on GitHub here: https://github.com/sendecomp/sendecomp-website

LinuxMercedes commented 4 years ago

Here's the only entry that renders with style: ieee:

@book{HuS12,
  title={Dependable and Secure Systems Engineering},
  editor={Hurson, A. and Sedigh Sarvestani, Sahra},
  isbn={9780123965257},
  series={Academic Press},
  url={https://books.google.com/books?id=4Q\_d8DW0aWMC},
  year={2012},
  publisher={Elsevier},
  groups = {cpci}
}
inukshuk commented 4 years ago

I can't reproduce this so far. Can you tell me when exactly this error happens? Is it when liquid evalutes {{ reference }} in your template? At this point, the reference is already fully generated so I can only imagine that liquid for some reason has a problem with inserting the UTF-8 string into the output file, which, I suppose was opened using your desired encoding.

At a quick glance, I suspect the symbol causing the trouble is the hyphen in Cyber-Physical (citeproc-ruby probably replaces the -- there).

Could you try manually converting the string here -- this is where the data that will be passed to the liquid gets set. We could query the configured encoding there and try to convert the string, but I don't think we could ever guarantee that this works.

LinuxMercedes commented 4 years ago

Ah! This is what I get for assuming it "could never" be something obvious. In my template I had {{ reference | replace: '“', '"' | replace: '”', '"' | replace: '–', '-' | replace: "’", "'" }}; replacing this with {{ reference }} fixes the rendering issues.

HOWEVER, now all the smart quotes are garbage characters (e.g., "N. Jarus, S. Sedigh Sarvestani, and A. R. Hurson, “Towards Refinement and Generalization of Reliability Models Based on Component States,” in Proc. Resilience Week 2019, San Antonio, TX, USA, Nov. 2019.") so I need to figure out how to set the encoding for citeproc-ruby. I suspect that's what those replacees were doing.

inukshuk commented 4 years ago

Yes, citeproc-ruby replaces some symbols (hyphen in particular if I remember correctly); although in jekyll-scholar this could also be caused by the latex filter when reading the bibtex file in.

As for the quotation symbols used, these are actually determined by your CSL style and locale. You can override them in a custom style or you can adjust the CSL locale you're using. That way you would not need to replace the symbols after processing the references.

LinuxMercedes commented 4 years ago

Fantastic, thank you so much!

I can fix most of the quote issues in the locale, although open-inner-quote/close-inner-quote can't be set to ‘/’ because citeproc-ruby wants to use them in a regex?

There are a few places where the bibtex itself contains emdashes or curly quotes generated, as you said, by the latex filter. I take it I would have to write my own filter here to render that stuff into not-unicode?

inukshuk commented 4 years ago

Oh, yes, there are some annoying edge cases in the cite processor for quotations (because sometimes the input already contains quotes) so it's totally possible that we use it in patterns somewhere to guard against double quotes (or flipping single/double quotes around). We ought to escape the term values there!

With regard to the latex filter, yes, you'd have to add your own filter or maybe just disable the latex-decode filter in your _config.yml then strings in your bibtex-file should be passed to citeproc unchanged.