Open jjon opened 7 years ago
So, yes. After a little further experimentation it becomes clear that the Base64 methods are giving incorrect results for em dash and en dash. Using those methods (from http://api.simile-widgets.org/exhibit/STABLE/lib/base64.js) at the chrome console I get the following results:
Base64.encode('—') // em dash
"A=="
Base64.encode('–') // en dash
"w=="
Base64.encode('-') // hyphen
"LQ=="
Whereas, using the python base64 module, I get this:
>>> base64.b64encode('—') # em dash
'4oCU'
>>> base64.b64encode('–') # en dash
'4oCT'
>>> base64.b64encode('-') # hyphen
'LQ=='
Unfortunately, I don't know nearly enough about the bitwise manipulation of strings to offer a solution.
j
Hmm. I guess nobody wanted to embarrass me by pointing out that base64 is for encoding 8-bit characters! So, there's nothing at all wrong with the base64 methods. My problem is thus not a bug in Exhibit; however, it does seem that Exhibit.History.init
should be armored against this sort of thing. It seems like if Exhibit.Bookmark.generateBookmarkHash
is going to return base64, then the title
property of the state
object should be sanitized somewhere along the line.
Here's a corner case bug, but I'm not sure where, exactly, it lies.
When integrating an Exhibit into a WordPress site, I discovered that the URL generated by the "bookmark" function merely opened the page with the default set of data items, ignoring state. Tom Woodward, on the simile-widgets list very helpfully pointed out that the Base64 payload of the generated URL was corrupted.
Exhibit.History.getState()
retrieves an object with atitle
property. When the dataset has been filtered, thetitle
property is a string comprised of a pagetitle
followed by a stringsubtitle
generated byExhibit.History.pushState()
title += " {" + subtitle + "}";
(line 235 in history.js)In WordPress, that
title
string is a concatenation of the page template's "slug" and the site name. By some off-stage php chicanery, these are concatenated with a separator which is an en dash (\u2013). It is this character (as well as the em dash (\u2014)) that causesBookmark.generateBookmarkHash(state)
to produce a corrupted base64 string. When a browser tries to interpret the URL so generated, it simply ignores the corrupted payload, and loads the default page and dataset.Working at the browser console, I observe the following:
Note the en dash in the title. Then if we generate the Base64 string for the URL and decode it we get gibberish:
If we then alter the
title
property of thestate
object thus:Then do encode/decode as before:
We get the uncorrupted JSON string we need for the bookmark URL. My work-around for this is crude, but effective. I simply execute
document.title = document.title.replace(/\u2013/, "--");
in anonLoad
function, and all is well. But, I found it strange that ONLY \u2013 and \u2014 will corrupt the JSON string in response toExhibit.Bookmark.generateBookmarkHash(state)
. So far as I can tell, literally ANY other character will work, whether ascii or not. Is the Base64 function at fault?Anyway, not exactly crucial, inasmuch as there's an easy fix, but puzzling nonetheless.