Open ilotoki0804 opened 5 months ago
I just want to note that PyCharm allows you to tell it that any string literal is of any language it supports and basically supports all the IDE features for that language.
You do have to manually tell it this though.
edit: You can also add comments like # language=<language_ID>
before a string literal in PyCharm and it will know what to do. There are also rules for automatic injections.
This isn't a vote for or against a Language
type, just providing additional context.
Instead of an actual type, any reason not to do Annotated[str, Language("html")]
? That would inherently allow LiteralString
and bytes
, and has all the correct type semantics. It is a little more verbose, but a type alias solves that.
This topic has been explored in some depth within this pylance discussion.
To me, this doesn't seem like something that necessitates an extension to the type system. There are many ways that can be specified using existing type system or language constructs.
@dmwyatt Using comments to indicate language is fine. However, using Language
has several advantages over using comments.
While comments are limited to a single string and must be manually marked, types have the advantage of being automatically applied to any code that uses functions or other typed elements.
For example, any code that uses the following execute
function will automatically receive syntax highlighting for JavaScript.
def execute(code: Language["js"]):
...
Consider a function like this
def build_html_from_markdown(article: str, style: str, script: str) -> str:
"""`article` takes Markdown, `style` takes CSS, `script` takes JavaScript."""
...
This function uses a bunch of different languages at once, thus there's a bit of ambiguity when using it.
# language=???
build_html_from_markdown("# hello, world!", "h1 { background: pink; }", "alert('welcome!')")
This could be fixed by modifying the function to span multiple lines, but this effectively demonstrates that a more fundamental solution is needed.
Using the Langauge
type solves this problem.
def build(
article: Language["markdown"],
style: Language["css"],
script: Language["js"],
) -> Language["html"]:
...
build("# hello, world!", "h1 { background: pink; }", "alert('welcome!')")
Specifying the language as a type reduces the need to express information about it in other ways, and allows users to infer what the parameters of a function require from the type hint before reading the documentation. And it also allows developers to write clearer code, since the type hint expresses the format, leaving the variable or parameter names to express other, more important information. I think this is in line with why type hints were introduced.
@erictraut While there are many different ways to tell what language a string contains, there hasn't been a single, universally followed method. But it's important enough that we need a formal, documented way of doing it, and I think the Language
type is appropriate for it.
Annotated[str, Language("html")]
@TeamSpen210 If we can achieve static syntax highlighting with this implementation, I think it might be one of the options to consider, but I wonder if we can achieve syntax highlighting via Annotated
.
Language[str, "html"]
.The idea of specifying a type for the first parameter of Language
is worth considering.
In my opinion, this alternative might be better if it is decided to implement LanguageBytes
, but if not, it would be better to just use Language
and LiteralLanguage
in favor of simplicity over extensibility.
Yes, I agree that the comments implemented by PyCharm are not fool-proof. I was merely pointing towards prior art.
I think the underlying thing you're reaching for might be the lack of a standardized way of annotating languages in strings that is accepted across all IDEs and editors? Maybe types are the best way to get to that point...I don't know. I certainly like the idea.
The more generalized issue is the lack of a way to specify the structure or type of the data in a string. One can imagine there are many types of data that can be contained in a string, and programming languages are just one of them.
I'm very sympathetic to all of these ideas:
typing
module had a Language
type it would surely be fairly quickly adopted by many IDEs and I'd use the heck out of it.
Currently, Python has no consistent way to indicate when a programming language is represented as a string that the string follows the syntax of a particular programming language.
This means that languages represented as strings cannot be syntax highlighted, resulting in a significant loss of productivity, readability, and an increase in bugs and errors when dealing with other languages as strings.
This article gives an example of the current problem.
Traditional approaches and issues
Typical case
Typically, syntax highlighting is not provided at all because there is no way for the editor to know the language of the string, which leads to several drawbacks.
Batch syntax highlighting of raw strings for regexes in VSCode
VSCode provides simple syntax highlighting for regexes when using raw strings, as shown below.
However, this approach has several drawbacks. First of all, it doesn't generalize to languages other than regexes. Also, since raw strings aren't just for regexes, it creates a visual distraction for people who want to use raw strings for non-regex reasons, such as Windows paths.
Below is an example of syntax highlighting for regex applied to Windows path, which actually reduces readability.
Language
andLiteralLanguage
Language
is a subtype ofstr
that indicates that the string represents a specific language.LiteralLanguage
is a subtype ofLiteralString
, and is used in the same way asLanguage
.Language
takes a single type argument, and in its place you put the name of the language, for example,Language["html"]
.Editors should provide basic syntax highlighting for string literals set to types
Language
orLiteralLanguage
. Consider code blocks in Markdown.The
Language
type may also be implied by the type of the parameter.Errors
It is difficult to set the
Language
type to remain aLanguage
type after an operation, as this would complicate the implementation and make it difficult to provide a clear criterion for the type.For example, does
Language["A"] + Language["A"]
always result inLanguage["A"]
? Of course it often does, but it's very hard to generalize.The case of
Language["A"] + Language["B"]
is also tricky. Should we catch the type asLanguage["A"]
, or should it beLanguage["B"]
? And what aboutLanguage["A"].strip()
? It's hard to maintain consistency or a single standard for these operations. Therefore,Language
should be considered more as a feature for annotation than for complex static type checking.Therefore, a type checker should accept the target of a given
Language
type as legitimate if it is a string, regardless of its contents, and an editor should not raise an error if it fails to parse.Developers should also not expect that when they accept a value annotated with `Language' that the string is fully valid code that will pass the language's compiler.
Conversely,
Language
can be used for code that is "reasonably close" to the appearance of the language. Developers should consider whether syntax highlighting helps or hinders users when deciding whether to useLanguage
or just usestr
for languages that are not exactly the same as the target language.Post-operation type
The type
Language
should be treated asstr
when computed, andLiteralLanguage
should be treated asLiteralString
when computed.BytesLanguage
?ByteLanguage
is the bytes version ofLanguage
. We should think about whether we need this type.However, there is no type called
LiteralBytes
, so at leastLiteralBytesLanguage
can't exist.Language names
The language identifier in
Language
must be lowercase, e.g.Language["python"]
instead ofLanguage["Python"]
.For language names, it seems like a good idea to use what is used for code blocks in Markdown that developers are familiar with, but the exact definition of this is up to the editor.
Supported languages
A list of supported languages is beyond the scope of this documentation and should be up to each editor's implementation. However, editors should be able to provide basic syntax highlighting for common languages like Python, HTML, SQL, etc.