mitodl / ocw-studio

Open Source Courseware authoring tool
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

added encoding/decoding for URL in markdown editor #2231

Open umar8hassan opened 3 days ago

umar8hassan commented 3 days ago

What are the relevant tickets?

closes https://github.com/mitodl/hq/issues/4590

Description (What does it do?)

This PR aims to encode/decode URLs in text editor in ocw-studio for correct browser readable URLs. The opening and closing brackets will be saved as %28 and %29 respectively for being later used in course compilation.

How can this be tested?

  1. Switch to branch umar/4590-links-to-urls-that-have-parentheses

  2. Create any resource and link a URL with parentheses, e.g https://commons.wikimedia.org/wiki/File:Saint_Joseph_charpentier_(La_Tour).jpg in the text editor.

    Screenshot 2024-07-02 at 5 47 47 PM
  3. Save and publish changes.

  4. Visit the published site. and Click on the URL it follow proper URL.

Additional Testing

  1. The content data in markdown should be save in encoded format. Screenshot 2024-07-02 at 5 54 42 PM
pt2302 commented 3 days ago

Please add tests for the functionality in this PR to https://github.com/mitodl/ocw-studio/blob/master/static/js/lib/ckeditor/plugins/Markdown.test.ts.

pdpinch commented 1 day ago

Does this only address parentheses? Will there be other characters that hit this type of problem, or is there something special about parentheses?

umar8hassan commented 1 day ago

Does this only address parentheses? Will there be other characters that hit this type of problem, or is there something special about parentheses?

@pdpinch this happens for () only. Markdown has [title](url) syntax to store hyperlinks. When a url with parentheses is inserted it becomes [title](url_with\(parentheses\)). The () inside url needs to be escaped to meet the markdown syntax. This escaped markdown string is saved in DB which is then used for site compilation.

The html url encoding for () is %28%29. However, for the escaped characters \(\) it is encoded to %5c%28%5c%29 which makes the URL invalid on compiled site.

To handle this, we encode/fix the url before saving it to DB. Then the correct encoded url is used in compilation. On author side, we decode it to show the original utl with parentheses.