speedata / publisher

speedata Publisher - a professional database Publishing system
https://www.speedata.de/
GNU Affero General Public License v3.0
296 stars 36 forks source link

wrong escaped `#` in `<A href="http://a.b#c">` #472

Closed pr-apes closed 1 year ago

pr-apes commented 1 year ago

@pgundlach,

the following source shows that Publisher wrongly escapes the hash #, breaking the identifier part in the address:

<Layout xmlns="urn:speedata.de:2009/publisher/en"
  xmlns:sd="urn:speedata:2009/publisher/functions/en">

  <Record element="data">
    <PlaceObject>
      <Textblock>
        <Paragraph>
          <A href="https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf#G6.2152819">
            <URL>
              <Value>https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf#G6.2152819</Value>
            </URL>
          </A>
        </Paragraph>
        <Paragraph>
          <A href="https://doc.speedata.de/publisher/en/saasapi/index.html#ch-saasapi">
            <URL>
              <Value>https://doc.speedata.de/publisher/en/saasapi/index.html#ch-saasapi</Value>
            </URL>
          </A>
        </Paragraph>
      </Textblock>
    </PlaceObject>
  </Record>

</Layout>

Both links (https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf#G6.2152819 and https://doc.speedata.de/publisher/en/saasapi/index.html#ch-saasap) are unreachable if escaped.

Could you take a look at this? Many thanks for your help.

pr-apes commented 1 year ago

According to https://www.rfc-editor.org/rfc/rfc3986#section-2.2, :, /, ?, #, [, ], @ (and !, $, &, ', (, ), *, +, ,, ;, =) are reserved characters.

If I'm not wrong, this is the function that encodes them:

https://github.com/speedata/publisher/blob/f9ef96dc1b90552e210188e273c72706972e15a2/src/lua/publisher.lua#L6993-L7001

At least, what the RFC 3986 calls gen-delims should be not be escaped in urlencode.

pr-apes commented 1 year ago

The fix was simple. I hope it might help.