nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.55k stars 1.47k forks source link

The nim compiler doesn't report the error lines properly after 65535 lines #15165

Open ghost opened 4 years ago

ghost commented 4 years ago

Example

for i in `seq 1 70000`; do echo "echo 1" >> gg.nim; done
echo "let x: int = \"a\"" >> gg.nim
nim c -r gg.nim

Current Output

.../gg.nim(655535, 14) Error: type mismatch: got <string> but expected 'int'

Expected Output

.../gg.nim(70001, 14) Error: type mismatch: got <string> but expected 'int'
$ nim -v
Nim Compiler Version 1.2.0
ghost commented 4 years ago

But is this REALLY an issue? I would not consider files with more than 65535 lines (the previous limit was 32767 with int16 but it was changed to uint16) to be useful

alehander92 commented 4 years ago

this should be specified and an error message for that should be printed

alehander92 commented 4 years ago

this can be a problem for autogenerated libs e.g. parsers or something

ghost commented 4 years ago

This came up for us when we generated a big file and no, we don't want to split it because we don't have to do that in any other language, just like we don't have to workaround the case-insensitivity issues either.

Clyybber commented 4 years ago

I think the best course of action here is to render the line number as ??? or >65535. Changing the size of the lineinfo number would hurt performance and probably memory usage. I suggest using include or consuming AST via macros or something. And I suppose you want to catch errors before the nim layer anyways if what you are working on is a language that compiles to Nim (is it?).

ghost commented 4 years ago

I think the best course of action here is to render the line number as ??? or >65535.

That would be still useless.

Changing the size of the lineinfo number would hurt performance and probably memory usage.

We are talking about a few bytes here. This is just pico-optimization.

if what you are working on is a language that compiles to Nim (is it?).

Not really.

ghost commented 4 years ago

I still think that >65535 lines of code is a pretty okay limit for 99.99%, and otherwise you can always just use include :)

This came up for us when we generated a big file and no, we don't want to split it because we don't have to do that in any other language, just like we don't have to workaround the case-insensitivity issues either.

not sure which "case-insensitivity" issues you're talking about here, but this issue isn't about that, so okay :)

EDIT: Also I hope you know that if you combine different "pico-optimizations" you get real performance impact :)

ghost commented 4 years ago

I still think that >65535 lines of code is a pretty okay limit for 99.99%, and otherwise you can always just use include :)

It's a pointless limit because you might spare like 16 bits mid-compilation(with uint32).

not sure which "case-insensitivity" issues you're talking about here, but this issue isn't about that, so okay :)

Nim identifiers are case-insensitive and ignore underscores - really bad for C compatibility.

EDIT: Also I hope you know that if you combine different "pico-optimizations" you get real performance impact :)

I also hope you know that it is not even worth to be called an optimization because it just wastes developer time. With the introduction of the include statements and their references we will still need to increase the compiler's memory usage.

disruptek commented 4 years ago

I generate files with >100k lines of real code. The key word here is “generate”. I don’t care what the error message says in the event of an error.

Also, the design responsible for this problem of tracking line numbers absolutely is silly and embarrassing, not least because it has real bearing on other design in the compiler (eg. IC).

alehander92 commented 4 years ago

trialism those 16 bits are part of each Node object: so they are much more than 16 bits , 16 * all the node objects : I think there is a miscommunication here, this isn't a single value!

Araq commented 4 years ago

How exactly are you not wasting your developer time by creating such large files?

ghost commented 4 years ago

How exactly are you not wasting your developer time by creating such large files?

Sometimes we need to generate code and sometimes those source files are really big and are either hard to split up or there would be too much splits.

Araq commented 4 years ago

Yes, I understand, but compile-times are important too, so usually it matters to generate less code for reasons having nothing to do with Nim's line cap quirk. Maybe your generated code can generate procs or templates or something, code compression techniques usually work fine. The Nim compiler itself is only about 80_000 lines of code fwiw (not counting the stdlib it uses).

timotheecour commented 4 years ago

@trialism you can as suggested above split your generated code in several includes. This should not affect CT performance.

# module large
include large_split0
include large_split1
include large_split2
...

but most likely you should be able to make better use of macros/templates to avoid generating so much code.

jibal commented 4 years ago

@disruptek

How does it affect incremental compiling? (I'm not questioning that, I seriously would like to understand the issue.)

jibal commented 4 years ago

@alehander92

trialism those 16 bits are part of each Node object: so they are much more than 16 bits , 16 * all the node objects : I think there is a miscommunication here, this isn't a single value!

There's no need for both the column number and the line number to take 16 bits; the column number could be cut down, say to 10 bits to be generous, allowing for 4194304 lines before overflow. Unfortunately, making such a change isn't quite trivial because of all the uint16 casts rather than using proper accessors.

disruptek commented 4 years ago

How does it affect incremental compiling? (I'm not questioning that, I seriously would like to understand the issue.)

When caching nodes, we have a trivial encoding optimization that is, nonetheless, still relatively expensive when it comes to recording so very many values. Nim's extensive metaprogramming makes the problem worse.

ghost commented 4 years ago

@Araq @timotheecour no, macros won't work because the generated code is for type definitions, function declarations etc. include won't help either because there are recursive types and nim doesn't support the forward declaration of types(?). Unfortunately, nim's underscore quirks and other naming issues provide a really bad C compatibility which is why we dropped nim support.

juancarlospaco commented 4 years ago

How other programming languages do it?, Nim cant be the only one in the world that has line numbers. All programming languages have limits too, I remember Python has a limit of 255 function arguments.

Araq commented 4 years ago

@juancarlospaco LLVM uses a dedicated structure based on file offsets (int32) that are translated into (line, col) pairs on demand. So that it uses little memory like Nim, but without the 65000 limit. It's much work to implement though.

Araq commented 4 years ago

There's no need for both the column number and the line number to take 16 bits; the column number could be cut down, say to 10 bits to be generous, allowing for 4194304 lines before overflow.

Oh, I like this idea! :-)

foldl commented 4 years ago

I don't think this is a good idea until there comes a limit on line width in language spec.

There's no need for both the column number and the line number to take 16 bits; the column number could be cut down, say to 10 bits to be generous, allowing for 4194304 lines before overflow. Unfortunately, making such a change isn't quite trivial because of all the uint16 casts rather than using proper accessors.

jibal commented 4 years ago

That doesn't make sense to me. First, it's not a limit on line length, it just means that error messages have the wrong column if the error comes after column 1024. Second, there is already an undocumented "limit" (that isn't a limit) of 65536 on both the column number and the line number. This just shifts that to a more sensible balance and to better fit need and practice. It's far more important to get the right line number, and 65536 is being exceeded.

And there's no reason why something can't be added to the documentation if/when this change is made, so the objection doesn't stand on its own logic. But it should go into the compiler guide, not the language manual (which is still pretty far from a spec)--it's not a language limitation. The only thing that belongs in a language spec is a minimum that implementations must support, and the current compiler far exceeds the sort of minimal limit offered by the C language spec (lines longer than 4095 characters are not guaranteed to compile), for instance.

foldl commented 4 years ago

Don't get me wrong. I think this is surely a bug (or limitation) in the compiler, which must be solved (or improved). But it should not be solved in such a tricky manner.

By the way, I don't think it's faster to do 16bit math than 32bit on a modern CPU. Maybe save several bytes? Then less cache misses?

Just change it to 32bit.

jibal commented 4 years ago

It's not a "tricky manner"-- not that there's anything wrong per se with being tricky. And "change it to 32bit" has already been addressed, if you would just read the previous comments. Since you haven't read the thread and completely failed to address any of my points which completely refuted your complaint, but have now moved the goalposts with a completely different complaint, I won't respond further to your comments.