remark-embedder / core

πŸ”— Remark plugin to convert URLs to embed code in markdown.
MIT License
118 stars 6 forks source link

getUrlString should check for valid URL #135

Open anarion80 opened 1 month ago

anarion80 commented 1 month ago

Relevant code or config


const getUrlString = (url: string): string | null => {
  const urlString = url.startsWith('http') ? url : `https://${url}`

  try {
    return new URL(urlString).toString()
  } catch (error: unknown) {
    return null
  }
}

const urlString = getUrlString(value)

What you did: I run a simple markdown like below:


#### Output

-   Fruit
    -   Apple
    -   Orange
    -   Banana
-   Dairy
    -   Milk
    -   Cheese

I created my own version of oembed transformer that included a fallback, that if the extracted link is not an oembed link, then I use my own custom bookmark around it.

What happened:

Unfortunately @remark-emedder/core returns even simple strings as URLs:

πŸš€ ~ remarkEmbedder, node: {
  type: 'text',
  value: 'Banana',
  position: {
    start: { line: 171, column: 9, offset: 3707 },
    end: { line: 171, column: 15, offset: 3713 }
  }
}
πŸš€ ~ remarkEmbedder, isValidLink: false
πŸš€ ~ remarkEmbedder, value: Banana
πŸš€ ~ remarkEmbedder, urlString: Banana

πŸš€ ~ shouldTransform: ~ url: https://banana/

Reproduction repository:

Problem description: @remark-emedder/core getUrlString returns every single line string as url (with https:// appended to it. It seems to later rely on shouldTransform function to filter all such links out, but in some cases this is too late. I can't check in shouldTransform function if the link is a valid URL, because it always is, coming out of getUrlString function.

Suggested solution: Enhance the getUrlString function to check if the given text is actually a link using some robust regex, and only return a true, viable URL

Something like this works:

const getUrlString = url => {
    const urlRegex = new RegExp(
        /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=+$,\w]+@)?[A-Za-z0-9.-]+(:[0-9]+)?|(?:www.|[-;:&=+$,\w]+@)[A-Za-z0-9.-]+)((?:\/[+~%/.\w-_]*)?\??(?:[-+=&;%@.\w_]*)#?(?:[\w]*))?)/
    );
    if (!urlRegex.test(url)) {
        console.log('🚩 Not a valid URL!:', url);
        return null;
    }
    const urlString = url.startsWith('http') ? url : `https://${url}`;
    try {
        return new URL(urlString).toString();
    } catch (error) {
        return null;
    }
};