scottkleinman / aeme

AEME Development Repo
1 stars 2 forks source link

External references with ampersand in the url #46

Open scottkleinman opened 9 years ago

scottkleinman commented 9 years ago

I'm trying to code <ref target="http://www.hrionline.ac.uk/mwm/browse?type=ms&id=118" type="MWM">http://www.hrionline.ac.uk/mwm/browse?type=ms&amp;id=118</ref> (where "MWM" is Manuscripts of the West Midlands). However the ampersand in @target does not validate. If I change it to &amp;, it validates, but, when I paste that into my browser, the page is not found.

How exactly should we handle this situation?

skgoetz commented 9 years ago

Does the URL work if you change the ampersand to a semicolon? Some sites permit it.

scottkleinman commented 9 years ago

Semicolon doesn't work. It looks like ampersand has to be escaped as an entity to validate, and the url just has has to be processed with something like PHP's htmlspecialchars_decode() on output. I'm not exactly happy with that, as it means that the url does not work if the xml is used separately from our rendering engine. The issue should probably be documented, perhaps in <encodingDesc>, where the Manuscripts of the West Midlands tag is defined.

dorothyk98 commented 9 years ago

Can we send them an email and ask if they have suggestions directly from the project itself? Do we know others who might have had this issue? like PPEA or T-Pen?

scottkleinman commented 9 years ago

Not sure what you're asking. Can you rephrase?

-S

On 17 November 2014 11:30, dorothyk98 notifications@github.com wrote:

Can we send them an email and ask if they have suggestions directly from the project itself? Do we know others who might have had this issue? like PPEA or T-Pen?

— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scottkleinman_aeme_issues_46-23issuecomment-2D63360784&d=AAMCaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=fkkkcAta9tNbJT0GbA-b8fBT5Vx0day25Z1KcBOKxKQ&m=jVkdLSayO-YTKMMJ3sW6CaUtluZD9QZJyixC-AsEfeU&s=p5BsUjaUjZzHWv5C_g7Rx0FBNHdtnImICMxmX9p_C40&e= .

Scott Kleinman Professor of English Director, Center for the Digital Humanities California State University, Northridge

scottkleinman commented 9 years ago

But no rush. The plane is about to land, so I'll be going offline for a few hours.

-S

On 17 November 2014 11:30, Scott Kleinman scottkleinman@gmail.com wrote:

Not sure what you're asking. Can you rephrase?

-S

On 17 November 2014 11:30, dorothyk98 notifications@github.com wrote:

Can we send them an email and ask if they have suggestions directly from the project itself? Do we know others who might have had this issue? like PPEA or T-Pen?

— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scottkleinman_aeme_issues_46-23issuecomment-2D63360784&d=AAMCaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=fkkkcAta9tNbJT0GbA-b8fBT5Vx0day25Z1KcBOKxKQ&m=jVkdLSayO-YTKMMJ3sW6CaUtluZD9QZJyixC-AsEfeU&s=p5BsUjaUjZzHWv5C_g7Rx0FBNHdtnImICMxmX9p_C40&e= .

Scott Kleinman Professor of English Director, Center for the Digital Humanities California State University, Northridge

Scott Kleinman Professor of English Director, Center for the Digital Humanities California State University, Northridge

dorothyk98 commented 9 years ago

Have any of the other like minded projects dealt with this issue? (PPEA, MESA, T-PEN), how are they dealing with this particular issue. Just musing, are we like the first project to have had this problem with trying to code this site?

scottkleinman commented 9 years ago

OK, @dorothyk98, I got your message via e-mail on the plane and didn't notice that it was coming from GitHub. Hence my confusion. I think that this is definitely a question for the TEI folk, but I suspect that it is unsolvable. It should be trivial to render &amp; as & in our own platform, but the purpose of TEI is to make code that is platform-neutral. I think the best we can do is put a note in to the effect that for this particular web site, the xml entity must be rendered & in any hyperlinks (or changed to & prior to pasting into a browser's address bar).

dorothyk98 commented 9 years ago

I wonder if the site even knows that their URL is giving people issues. Can we throw this back to them and ask if they could do something to make it more digestible for the rest of us?

skgoetz commented 9 years ago

As I understand it, this isn't a problem specific to their site. It's a common issue. https://duckduckgo.com/?q=ampersand+url+site%3Astackoverflow.com

scottkleinman commented 9 years ago

Most of the Stack Overflow questions seem to be from people who don't know that &amp; has to be transformed to & at some stage for the URL to be valid. The method for doing that is normally something trivial, but it's something that has to be done in the rendering platform. My concern is that we are supposedly producing markup that is not platform specific, in which case we are technically recording an incorrect URL. But I just thought of a possible solution. What if we used this:

<ref type="MWM"><![CDATA[http://www.hrionline.ac.uk/mwm/browse?type=ms&id=118]]></ref>

If we are creating a link from this, we need to use the element value instead of the value of @target, but that should be pretty trivial. (NB. You can't put CDATA in an attribute.) And now we have recorded one URL, which is correct.

Thoughts?

skgoetz commented 9 years ago

Escaping the URL as CDATA makes sense to me.

Another possibility is to record the URLs elsewhere, outside TEI, and have ref invoke an ID in that other thing, whether database or otherwise. The CDATA idea has much less overhead than that, though it's conceptually weird: it's almost like making ref empty.

scottkleinman commented 9 years ago

I take your point. <ref> seems less "empty" to me if it's easy to grab the CDATA. Your solution is better if accessing the URL that way is easier. We should go with the path of least resistance.