nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

RFC: move file I/O out of `system.nim` #67

Closed arnetheduck closed 2 years ago

arnetheduck commented 5 years ago

In line with Nim being a systems-level programming language, it makes sense that system.nim cover only a minimal set of language-specific features and leave interactions with the outside world to other modules, so as to separate concerns cleanly and ensure that a stable core of the language can be used and depended upon from a broad set of platforms.

It's also been commonly noted that system.nim has grown uncomfortably large as more and more features are added to it.

Thus I'd like to propose all things file be moved to a separate module that must be explicitly imported.

  1. File-system support is not present in several cases: embedded, boot processes, WASM, browsers or more exotic cases like eWASM and EVM (ethereum blockchain virtual machines).
  2. File-handling code tends to age poorly - all of python, Java, C++ etc have gone through dramatic changes in their "blessed" way of handling files - having it in a separate module paves the way for a clean upgrade path in the future - a second module can be implemented side-by-side.
  3. Needs for file io differ wildly between applications - some are better served by asynchronous callbacks, others by memory mapping or streams etc. Addressing these needs in separate modules seems prudent.
  4. The way handles are represented on different os's differs, as do the capabilities associated with file handles - on unix, they're powerful beasts that integrate with sockets, processes and a whole bunch of other things while on windows, they're more specialized. Moving to a module will pave the way to good nim api:s that can make good use of these specialized capabilities without interfering with the lowest-common-denominator approach that's currently taken by file I/O in system.nim
awr1 commented 5 years ago

Adding to the complexity of file-handling in general are sandboxed platforms that define more extreme limitations on what you can do with File I/O, like macOS App Sandbox, UWP (which admittedly Nim does not support), etc.

Araq commented 5 years ago

We might as well bite the bullet then and deprecate system.nim. Here is a new idea: The implicit import system statement from Nim is only used if the file does not start with something like {.disableSystem.}.

And system.nim consists of something like:


import std / [sysio, gc, integerops, assertions]

export sysio, gc, integerops, assertions

(Ideally, in practice it will be much messier...)

zielmicha commented 5 years ago

{.disableSystem.}.

There is already from system import nil which seems to do that.

Araq commented 5 years ago

There is already from system import nil which seems to do that.

It doesn't really do that, you can still do var x: system.int then for example.

arnetheduck commented 5 years ago

We might as well bite the bullet then and deprecate system.nim. Here is a new idea: The implicit import system statement from Nim is only used if the file does not start with something like {.disableSystem.}.

+1 for {.disableSystem.} or something like that and having well-defined parts of system.nim that can be imported separately - that would be great for building your own custom system for the edge cases (bios's, wasm:s etc). That opens some questions though as to how modules with different system.nim's should interact - is it even meaningful?

I'd be careful however to ensure that there exists a system.nim, and in it, I'd document the bar for which stuff gets included there. That requires a clear view on where that bar should be. An example would be:

In system.nim go things that define the language - a good litmus test would perhaps be - does it require a magic or is it related to the use of a magic?

bite the bullet

There's two ways to approach this - either burn current system.nim and write a new one or the other way - remove stuff until you hit a usefulness level that's subpar. I started with file because it's the most obvious candidate and a good example gauge the temperature of the community. Would be happy to see another few move out of there as well, but this looks like an easy, practical and small step in the right direction.

arnetheduck commented 5 years ago

finally, I think the fact that large parts of the language are defined in system.nim is really cool - it allows me to easily crate a language without floats for example (eWASM nim) and still be sure that my custom language is "valid nim" and the code written therein stays compatible with standard nim. It almost begs the question why there are so many keywords in the parser.

Araq commented 5 years ago

I started with file because it's the most obvious candidate and a good example gauge the temperature of the community. Would be happy to see another few move out of there as well, but this looks like an easy, practical and small step in the right direction.

The problem is though that even moving IO out of system breaks most Nim projects out there and my solution starts with {.disableSystem.} but doesn't break any code.

arnetheduck commented 5 years ago

I'd think it can be done with a {.deprecated.} period. doing {.disableSystem.} effectively means never having a cleaned-up system.nim - the pain will never be smaller than today.

if done though, perhaps it's more interesting to do {.system: "xxx.nim".} with an empty string denoting "nothing at all". The downside of that would be that you get a haskell-like proliferation of preludes, instead of a single, small, extensible core that's universally supported across projects and platforms.

actually, #5698 is related which would help isolate os dependencies as well, in a different way.

Araq commented 5 years ago

I'd think it can be done with a {.deprecated.} period.

How so? "writeLine is deprecated, import io.nim to get a nice ambiguity error instead..."

arnetheduck commented 5 years ago

the biggest issue would be a few well-known names like File itself as well as the globals (stdin and friends) - if there's a separate File in the io module, it will resolve the ambiguities for the procs. That can be a -d:useSystemFile, as has been done with strings, seqs and nil. A few hard choices would have to be made about echo and the like as well.

In fact, a lot of what this feature is about "almost" exists today, as --os:standalone. One could say that an extended version this RFC is about taking the standalone mode and moving what's lost into modules that can then be imported in a more flexible way.

Araq commented 5 years ago

There has been made signification progress on this RFC and system.nim now delegates to io.nim for backwards compatibility but does not depend on it.

juancarlospaco commented 4 years ago

is this Fixed now :grey_question: :thinking:

arnetheduck commented 4 years ago

it's still indirectly in system.nim, just imported and exported, and thus taking up compile time and global naming space - I'd generally say no.

ringabout commented 2 years ago

update: see https://github.com/nim-lang/Nim/pull/19442

arnetheduck commented 2 years ago

for reference, we're using alternative I/O module in many of our projects: https://github.com/status-im/nim-stew/blob/master/stew/io2.nim - this might be a useful reference in the future.

It covers some needs that arise when writing secure applications:

c-blake commented 2 years ago

I like the {.disableSystem.}; import/include myNewPrelude idea as both declaring the lack of internal dependency and explicitly referencing any new external dependency right as the first/first few non-comment lines of code, instead of say a standalone like compiler flag.

Personally, I think {.nosystem.} might be nicer to type. Or maybe {.raw.}, {.empty.}, etc. { Hey, bike-shedding idents is often a great way to get people engaged... :-) } nosystem might be best since it reinforces/user has to know about the system module. (They should, but newbies "should know" something are famous last words...)

Besides not breaking the world, it is much more general than just file IO. system/**.nim is currently a whopping 77 files -- roughly 25% of what is under lib/ (yes, not all are in play all the time...Even so. And yes, by lines of code it is "only" 15%). Araq's idea seems to solve all long-standing system-bloat questions in one fell swoop. It also paves the way for more system factoring/letting users pick & choose. Future questions become more what system/x depends on what other system/y or how much system/ relies upon the compiler (and how it relies).

Answering those questions might be hard. There can be gradual work to make x more autonomous or replace it by an external module. That work may be complex. Araq's idea lets such answers be discovered gradually (more|less) through real world practical experience with various separations/combinations while being "cross compatible" and explicit.

Anyway, the io2.nim work seems nice and no one here even argued against Araq's idea, but I thought I would voice more support for it. It also adds another to the quiver of argument arrows that "the stdlib is just a starting point - it cannot possibly be all things to all devs/circumstances". In almost all prog.langs, the stdlib is "the first, not the last" answer to almost everything.

That's about all I have to say. Maybe Araq's thought of something even better in the interrim.

Araq commented 2 years ago

Since then I'm considering an even more drastic solution: To get system.nim, type import std / [system]. Yes it makes "hello world" longer but that's it. Everything else works better:

c-blake commented 2 years ago

I am unsure if nim c --useVersion:1.0|1.2 style backward compat affords "true" clean-up, but I also don't know who uses that (if anyone). The clean-up possibilities are maybe tricky.

Though I am certain Araq knows, it bears mentioning that import system always did work (even before from system import nil, I believe). So, this more drastic idea still allows almost no work to have code function in both old & new world.

Also, the nim compiler has had --import since its very first public release and the implicit system import could always have been just done in the default config. So, under this new, more drastic idea, backward/forward compatibility is do-able via nim.cfg/config.nims on a per-file/project/etc. basis which is not so bad..(at least back until foo.nim.cfg was introduced..at least 2014 if not earlier). It may be that a --import-nil or something might be interesting..

There could be other mechanisms I'm forgetting...

Vindaar commented 2 years ago
  • No special casing of system.nim

As long as system is filled with a bunch of magic procs, it's hard to say it's not special cased. I have some trouble imagining what Nim code without system would even look like, given that essentially all primitives are defined there. Sure writing from system import nil I can then just prefix everything by system and it still works. But it's not like I can write an alternative system? At least not without falling back to the same magics, I suppose?

edit: Aside from that though, I'm all in favor of breaking changes if it really helps to clean up stuff.

konsumlamm commented 2 years ago

While I definitely see the advantage of cleaning up system, I don't see a real advantage in having to explicitly import it. That would mean you'd have to import std/system in every single file. There isn't anything useful you can do without system, is there? As I see it, it would just be unnecessary boilerplate.

c-blake commented 2 years ago

Both these comments connect to what I meant by "how much system/ relies upon the compiler (and how it relies)". Some stuff relies heavily on magics while other stuff does not. A "BS" grep measurement suggests there are 5x more '^proc' than "magic" in lib/system/, though maybe as a "soft bound".

Re: @vindaar's, it is surely true that re-implementing system or parts therein is an.."ambitious task", but the same can be said of many Nim things. This RFC singled out "io" as an easy case. The C world has many implementations of libc..even the entry point crt0 type main-calling entry point stuff. Many (most?) nim users are unaware of the flurry of activity to compile even a 0 length empty.nim file (as just one of many ways of measuring overhead).

Writing an alternative (maybe radically trimmed down) system with the same magics available (but maybe not used) should kind of be the "imagined use case", even if the real motivation is "v2 system" vs "v1 system". { They are similar if you think about it. Also, someday there may even be v3. :-) }. Part of system/ being well factored might be that a "re-implementation" can be simply a piecing together of very small parts "a la carte"..a subset without the import except and wasted|wrong work.

Re @konsumlamm's point, the stock nim.cfg could perhaps have --import=system so that only re-implementors of system or "things system" might even notice a difference. We may want to add an --import-clear so people could locally be sure and import whatever they want. The re-implementor instructions to their users would be simply "add this one line to a config and then explicitly import". Or "delete this one line from the stock config". No real boilerplate needed -- IF simple (?) config instructions are allowable.

And, yeah, yeah, maybe this config idea does not go as far as Araq would prefer or maybe patchFile could already be used at this level (or be made to work. Trying that just now, I could not get it to work, but I am no patchFile expert.) And maybe it would not be a fertile ground for experimentation (along the lines of what @arnetheduck was doing with his io2). The last seems nearly impossible to know in advance.

I am just brainstorming compromises/directions to move the work along since @xflywind recently began work on separation. Resolving direction on this could maybe inform that work.

Araq commented 2 years ago

But it's not like I can write an alternative system? At least not without falling back to the same magics, I suppose?

It's hard but possible. You need to copy&paste the magics, yes, but sometimes you don't need all of them. Some magics are also magics for legacy reasons and could have easily been ordinary .inline procs or templates. And with Arc/Orc how the barebones runtime interacts with the compiler has been refined.

Araq commented 2 years ago

That would mean you'd have to import std/system in every single file. There isn't anything useful you can do without system, is there?

That's true but not really meaningful as a module typically has other dependencies too like tables or strutils and these are not auto-imported either. In practice the difference is pretty much between


import std / [strformat, tables]

and


import std / [system, strformat, tables]

And if you prefer include std / prelude then nothing would change at all.

juancarlospaco commented 2 years ago

With --import: and --include: sometimes you dont even need changes in the code (???).

metagn commented 2 years ago

The thing is modules like system are mostly part of the spec but other std modules are like implementation specific library modules. The ways to improve them are different.

As it is currently, even if system was not a default import, to save on compilation time you would still need to make your own module by copying system and removing everything you don't want manually because of how its includes work. You don't get to just import specific parts of system based on what you need. If this is solved, the size of system is a less important problem.

Another module that uses magics (and should be part of the spec) but is messy is macros. Half of macros is just convenience procs, but you need to import the whole module to write macros at all. On top of this, people write their own macro utility modules all the time. So not only do you have a big import just to write macros, it's also not enough for most people.

If you had a standard library module just for macro convenience, you could add as much as you want without problems. There are a few modules in std right now that are steps toward this direction. sugar is a collection of convenience procs, and genasts is a macro utility in its own module.

But there is a problem with these as well, being that they are too small. If the standard library is meant to be substituted by better user code, then there is no harm in these modules being as big or comprehensive as possible unlike system or the core parts of macros. The fact that you have to add a new import for every single convenience macro you use while modules like tables exist with 3 separate table types is cacophonous and adds to mental load. The burden of maintaining individual modules for each proc is also different to the burden of maintaining individual procs in a single module.

My point is system will not have "custom implementations" nearly as much as other standard library modules. It being more modular, an explicit import etc could be improvements, but that doesn't mean in any way that the same rules apply for system and any other standard library module. It still needs special care compared to other modules and the attitude toward it should be different.

c-blake commented 2 years ago

I agree that system replacement (even by a la carte selection/re-composition) will be/should be more rare/special/careful. Being possible remains valuable and the right mental model, though.

Re: moving "convenience" macro stuff, std/[genast,sugar] into a new std/macutils, not sure if that would be popular. In the context of Nim modules specifically, one can usually override a whole module just by having it found earlier in --path (or with patchFile for std/). This pushes for smaller modules more easily replaced in whole. Yet, I suspect chunkier modularity improves discoverability/sells functionality in more typical usage -- until it gets popular/useful enough for power users to then want to replace bits/pick & choose, but it also depends upon how liked package managers are & etc.

It's a complex, but important question interacting with people's workflows where consensus is often hard to get and also relates to "modules vs packages" (or maybe std/ subdirectories?). I tried to start a discussion about this for some hypothetical cligen-2.0 but got only one taker. I did link to one blogger's take. std/ may be special enough that broader discussions may not apply so well, but @Araq always seems to want (correctly, I think) std/ to be less not more special.

arnetheduck commented 2 years ago

That's true but not really meaningful as a module typically has other dependencies too like tables or strutils and these are not auto-imported either. In practice the difference is pretty much between

One thing we do very consistently is to re-export anything that a module requires to work - for example, if a module exposes a public API that returns a Table, we consistently re-export tables - this is really the only viable way to work with nim in any project that uses more than a few libraries, or the import lists become both impossible to manage and many bugs ensue because overly generic overloads get chosen randomly depending on imports, instead of the proper ones for the type.

Once you start doing this, imports and exports become a manageable problem - of course, the single global namespace is another major obstacle to productivity in library development, but this way, at least it's humanly possible to reuse a simple one.

If this is adopted by the std library itself, it would effectively mean that most std modules would re-export the version of system that they depend on, and the damage will be limited - most std modules use int for example and would have to re-exports the parts that deal with int - above all, this would "contain" the damage to existing code, because most non-trivial code imports something at least.

However, it would provide many of the benefits still: modules that start with a clean slate would be explicit about their dependencies and this is really the only way out here: being explicit opens the path for many other future cleanups, as well as "specialized" std libraries such as those that are needed for WASM, controllers, etc.

mratsim commented 2 years ago

In Rust and Haskell "no-std" and "no-prelude" are quite popular or at least they are vocal.

Also this opens the path to a "sys-noalloc" library where allocations are heavily discouraged (embedded, cryptography, GPUs, ...) and "sys-convenience" where you have access to seqs/strings and stuff like that.

But then there is the danger of too much granularity.