highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.3k stars 3.52k forks source link

(sql) 'foreign key' and 'primary key' do not highlight 'key'. #4057

Open htozaki opened 1 month ago

htozaki commented 1 month ago

Describe the issue When I write foreign key or primary key is sql language, primary or foreign is highlighted, however key is not.

Which language seems to have the issue? sql.

Are you using highlight or highlightAuto?

highlightAll().

Sample Code to Reproduce

<!DOCTYPE html>
<html lang="en">
<head>
<title>test</title>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/agate.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script>
<script>
hljs.highlightAll();
</script>
</head>
<body>
<pre>
<code class="sql">
create table Book (
    bookId      int,
    publisherId int,
    primary key(bookId),
    foreign key(PublisherId) references Publisher(id)
);
</code>
</pre>
</body>
</html>

Expected behavior 'key' is also highlighted.

Additional context Even though I add the following to my script to force the word 'key' as keyword,

const custom_keywords = ['key'];
hljs.getLanguage('sql').keywords.keyword.push(...custom_keywords);

key won't be highlighted. (However this addition will now highlights foreign key or foreign&nbsp;key primary&#8288; key etc when the separator is not a single space.)

I think the regexp using in COMBO does not recognize space as a part of keyword, that is the problem. Supposing the intention of this code is to treat foreign key as one keyword. In my script, changing the regexp so as to include space character

hljs.getLanguage('sql').contains[0].keywords.$pattern = /\b[\w\. ]+/;

solve the problem.

However I have not understand yet why create table is working as intended, so I am not sure this is correct fix.

Also it does not highlights when I write foreign key using 2 spaces etc, unless I added the key in keyword.

Though I have not understand the whole code yet, my suggestion would be the regexp at

https://github.com/highlightjs/highlight.js/blob/15d3b627fa7c99cb98d7b6760a6fbdbfd519d1a0/src/languages/sql.js#L660

should be changed to recognize space as a part of keyword. (Also it need to add key in reserved word list

https://github.com/highlightjs/highlight.js/blob/15d3b627fa7c99cb98d7b6760a6fbdbfd519d1a0/src/languages/sql.js#L99

in case there are 2 or more whitespaces between foreign and key.)

joshgoebel commented 1 month ago

It's because key isn't a keyword on its own in standard SQL I don't think? The solution is to add it to the the specific list that we're using for combo highlighting.

htozaki commented 1 month ago

However key part of foreign key is not highlighted. See the SQL part of https://highlightjs.org/examples

(Note that foreign key and primary key is already in the combo list. https://github.com/highlightjs/highlight.js/blob/15d3b627fa7c99cb98d7b6760a6fbdbfd519d1a0/src/languages/sql.js#L583-L584)

joshgoebel commented 1 month ago

I'm trying to remember what the \. is for, none of the keywords have a literal ..

I think for combos we may want somethingl ike:

 $pattern: /\w+(\s+\w+)?/,
htozaki commented 1 month ago

Aside from literal period, changing regular expression of $pattern is not enough to fix this problem.

When the text foreign key matches, parser goes to: https://github.com/highlightjs/highlight.js/blob/4bbf361241d727cbb361859525799cc5f0a651cd/src/highlight.js#L214 If the word equals to one of the keywords it returns data, however it requires exact match. So when the text is foreign key or something the returned data becomes undefined i.e. no highlights occurs there.

So we need to modify keywordData() or consider different handling with COMBOS.

Quick fix would be the following. It looks not good, furthermore I am afraid it might affect other languages.

First, change the keyword to regexp in sql.js (timezone in with timezone and without timezone need to be converted as well.)

      contains: [
        {
          begin: regex.either(...COMBOS.map((x)=>x.replaceAll(/\s+/g, "\\s+"))),
          relevance: 0,
          keywords: {
            $pattern: /\w+(?:\s+\w+)?/,
            keyword: KEYWORDS.concat(COMBOS.map((x)=>x.replaceAll(/\s+/g, "\\s+"))),
            literal: LITERALS,
            type: TYPES
          },
        },
        {
          className: "type",
          begin: regex.either(...MULTI_WORD_TYPES.map((x)=>x.replaceAll(/\s+/g, "\\s+")))
        },

Then in highligts.js

      function keywordData(mode, matchText) {
        return mode.keywords[matchText.match(/\s+/) ?
          matchText.replaceAll(/\s+/g, "\\s+") : // to the exact (not equivalent) regular expression.
          matchText];
      }