nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Fusion and stdlib evolution #310

Closed Araq closed 3 years ago

Araq commented 3 years ago

Background story

We tried to move PMunch's excellent socket stream implementation from a PR against Nim's stdlib to Fusion. This PR needs the new cast(tags: [].) construct and since fusion currently aims to support Nim version 1.2, we cannot make it available there easily. This means even though Fusion is supposed to be a staging area for the stdlib, contributing to Nim's stdlib is easier than contributing to Fusion. Clearly against Fusion's design. At the same time, Fusion is growing JS specific libraries and other helpers which don't follow our idea what Fusion should be about. Nothing against more JS specific wrappers, but Fusion aims to support its code for the next decade and JS wrappers are a poor fit for it.

At the same time, we don't need yet-another Fusion-like repository just because Fusion is used differently than we anticipated. And there are more problems with Fusion itself too: It ships both with the latest stable Nim and is installable as a Nimble package -- whenever we offer a choice we have to ensure that every choice does work and keeps working. Hence offering this choice is a bad idea.

Fusion specific proposal

Make Fusion broader in its scope. Do not ship Fusion with Nim. Fusion should target the latest stable Nim and also work with Nim devel, but not every module in Fusion needs to support eg. Nim version 1.2.

If fusion is no longer shipped with stdlib, many modules should make it into the stdlib, either in std2 or in std. Obvious candidates are: smart pointers, tree based collections, the pattern matching macro.

Proposal: Introduce a std2 namespace for Nim stdlib modules

  1. Allow the stdlib to grow without the Fusion staging area, for the non-controversial, simple, basic modules where it makes sense. We are already doing this, mostly.
  2. Introduce a std2 namespace as an alternative to an experimental namespace. The problem with the "experimental" name space is that it's not clearly defined when things are not "experimental" anymore. Also, designing an API for eg. rational numbers is much easier than designing new macros for async, yet every new module should start out as "experiemental", somehow.

And it gets worse: If we later move experimental / module to std / module when the module is not experimental anymore, we break code. So for backwards compatibility reasons experiemental / module has to remain! Let's assume we introduce std / module and then make experimental / module refer to std / module, like so:


# experimental module.

import std / module as m
export m

{.deprecated: "Import std / module instead".}

But now we created a situation where it's better to import experimental / module! Because that's the import that actually works with older versions of Nim! So client code becomes:


when NimVersion > (1, 5, 0):
  import std / module
else:
  import experimental / module

But only for the code that was updated, other old code simply uses import experimental / module without the when statement, which is ugly anyway. For these reasons it is much better if new, experimental modules start in the std namespace. Or in a new std2 namespace. Whether it's experimental or not can be mentioned in the documentation, but it is expected that everything in std2 becomes non-experimental with the release of Nim 2.0.

Alternative proposal: Introduce versioning for specific modules

There is some desire to evolve core libraries further:

However, these outlined changes cannot be applied to the existing os, strutils, io modules without severe interruptions for the people using these libraries (which is effectively everybody who uses Nim). So we can either have std2 / os or std / os2 or maybe even std2 / os2.

Considered, rejected alternative

An alternative that I considered was to come up with slightly different names like input_output instead of io2 etc. That's worse because the connection between modules named io and input_output is unclear. There is also no clear way how to name the 3rd version of io then whereas a name like io3 is obvious.

haxscramper commented 3 years ago

fusion does indeed feel somewhat "detached" from the nim stdlib, (multi-week waiting time for review, delayed documentation rebuilds, etc (documentation for fusion have been updated to include matching, but it still can't be imported on playground)), and doesn't feel like a good idea overall in terms of maintenance as it seems.

I want to clarify several points -

From "Fusion specific proposal" - If fusion is no longer shipped with stdlib, some modules would probably be need into stdlib, either in std2 or in std - is that correct?


If a module is added in std2 namespace, it won't be moved anywhere else (to avoid the same situation as experimental), and won't allow for backwards incompatible changes, just as regular std/* modules.


"versioning for specific modules", specifically os and strutils - it might be possible to implement std/os2 using wrappers on top of std/os, using similar approach:

proc joinPath*(head: AbsDir, tail: RelDir): AbsDir =
  AbsDir(os.joinPath(head.string, tail.string))

I've done this for oswrap module, along with some additional improvements for usability. It was relatively annoying one to write, but the advantage is that is requires almost no additional maintenance (for lower-level implementation at least) and can provide an additional layer of indirection to be used on both nimscript and c targets.

And making std/strutils be a wrapper for std/strutils2 (e.g. introduce in-place implementations and make old ones use newer internally).


Q-Master commented 3 years ago

I think that versioning for specific modules is slightly better than moving them to some other namespaces. There will be no such a thumb modules with just import/export and deprecated pragma after moving from one namespace to other and later some time writing an wrapper in the previous versioned module to support the previous API is a slightly better idea than waiting for everyone to switch to a newer API.

juancarlospaco commented 3 years ago

So the top comment makes it looks like JS libs are massively costly to maintain, but in reality is an official standard that once wrapped very very rarely changes.

Almost maintainance free, just someone changing ..code:: for runnableExamples.

Maybe in the past too much stuff was put into 1 module for JS target, thats not good, but now JS modules are tiny simple direct wrappers of a standard that wont change. (Theres no NodeJS-specific stuff on Fusion)

If someone wants to maintain less code:

Araq commented 3 years ago

If fusion is no longer shipped with stdlib, some modules would probably be need into stdlib, either in std2 or in std - is that correct?

Correct. I've updated the RFC to be clear about that.

Araq commented 3 years ago

So the top comment makes it looks like JS libs are massively costly to maintain, but in reality is an official standard that once wrapped very very rarely changes.

I usually worry if these things are supported by all major browsers but fair enough. However, Fusion wasn't created in anticipation of specific JS wrapper code. Nothing wrong with using Fusion for unexpected things, but it's time we update its official goals then.

HJarausch commented 3 years ago

Why not split Fusion into several parts which can be handled by different versions of Nim. The main Fusion would then import only those parts which can be handled by the Nim version in use.

juancarlospaco commented 3 years ago

Proposal: Layered stdlib

1 new compile option, possible naming:

N meaning

N is a single digit integer number literal.

Negative numbers are reserved for Nim core devs only, experimentals, quick prototyping, etc. Negative numbers are undocumented, managed by core devs only, no support. @Araq can have his secret bunker on --stdlib:-1, while others play demolition derby with the compiler on --stdlib:-2, Umbrella Corp on --stdlib:-3.

Normie users like me use positive numbers. Positive numbers are always documented, tested, etc. Nim without Fusion can be --stdlib:1, you can not use Fusion stuff with this layer.

Nim with Fusion can be --stdlib:2, you can use Fusion on this layer.

If some day a big company needs just the the compiler, or people doing embedded AVR needs just the the compiler, you can use --stdlib:0, thats the compiler and as minimal as possible of the stdlib.

If needed someday a --stdlib:3 can also be created.

Is flexible to allow even more customization, you can have a layer of "Pure libs only" etc, you get the idea.

Total Layers

Implementation

template stdlib*(level: static[int]; body: untyped) {.dirty.} =
  ## Enable `body` if the `currentLevel` is greater than or equal to `level`.
  when getoptions() >= level:   # getoptions() Reads "--stdlib:N" somehow.
    body

func newCrazyUnstableProc*() {.stdlib: 2.} =
  echo "Breaky weird code"  # No need to change imports, etc.
Araq commented 3 years ago

This RFC is about solving real, existing problems. It's not about joking around. In other words your proposal is so bad that I think you mean it as a joke. (Good luck trying to write libraries against these various language dialects. Which solve nothing.)

juancarlospaco commented 3 years ago

Good luck trying to write libraries against these various {.since.} dialects. Which solve nothing.

c-blake commented 3 years ago

I think there are many mixed together, but related problems here. If it is just about "distributing mutually cross-tested files", I think fision is the simplest idea (though nimble may need updates to make it easier..Not sure). That can distribute far, far more stuff (with the necessarily more lax review standards).

If it is about staging for new stdlib modules, then I say just stage them in the stdlib directly. Keeping them experimental initially should be just a per-module-name build-time switch or in-client-code pragma (like all other experimental nim language features). Then migration from experimental to more supported just means no longer needing that switch, and eventually the switch itself can probably be dropped (if anyone cares). --experimental:smartptrs and so on. People who don't want to type all of those can do a custom define that bundles them together not so unlike the "joke" --stdlib:2. Unbundling mixed together stuff is harder than bundling (or even impossible), though.

If it is about versioning modules in the stdlib with backward incompat APIs, I think a judicious once-in-a-decade move from os -> os2 and strutils -> strutils2 is not so bad to get better APIs. Extreme care should be used in doing the new APIs. For example, maybe openArray[char] for operations that truly need random access, but maybe some even more general iteration framework for things that only need that (EDIT: so we do not need toSeq conversion which can be costly, as still on the front page in the forum). To facilitate that, these new APIs should maybe start off as experimental as per above for a release or two so that interface iteration can be done and converge.

So, TLDR - A) first figure out experimental stuff, B) then use that for stdlib back incompat upgrades seems best to me. And distribute like 50-100 "important packages", not just a tiny set. Whatever the CI intends to keep working. Just my two cents, as they say.

c-blake commented 3 years ago

{ BTW, another reason I like std/os2 instead of std2/os is that there may be a good fraction of modules that can be kept back-compat literally almost forever. So, maybe nothing needs to change for them. OTOH, I guess people could be confused by std/md5 or std/base64.. }

disruptek commented 3 years ago

Here are just a few problems that I don't see resolved with this std2 idea:

This is my approach:

https://github.com/disruptek/dist

It's not very well described in the README, but this is a fine place to answer questions, I think. It solves all the problems above and in particular, it lets the compiler depend upon any code in the distribution, should it so choose.

As stuff moves out of std and into dist, it grows versioning, maintainers, collaborators, momentum, efficiency, seperation of concerns, etc. We can develop new versions of stdlib modules that might be more experimental without affecting the compiler at all.

We don't need to burn cycles testing the same code every time we want to test a compiler or stdlib change, and those test costs are more broadly spread across the contributors.

Dropping code is as simple as not including it in the next release branch of the distribution -- users can still pick it up via a requires directive to their package manager. Indeed, they can import module versions from the future (or the past) if they so choose, which appropriately relegates the distribution to convenience and not necessity.

The important_packages suite becomes easier to reason about and maintainers of those packages no longer need to track the devel branch. At the same time, people can reasonably live at head (or any compiler release, for that matter) with a broad collection of packages that work together and no garbage code that no longer works.

Users can realistically use Nim as a scripting language with unqualified imports of unversioned modules and, regardless of which Nim version they use, they can expect the code to work. No package management necessary.

You can reasonably use an older distribution with a newer Nim and you can roll this stdlib backwards and forwards as you see fit -- even individual modules can be adjusted in-situ. You can maintain your own fork of the stdlib trivially, if you so choose. You can even trivially include a specific version of the distribution as a requirement in your project.

It's trivial to run a "sparse" distribution in which you only download the portions of the ecosystem that you actually need. Similarly, an import foo can perform a just-in-time population of foo and its requirements, if desired, only once. Coupled with project-local distributions, no package manager is required at all.

Of course, sparse distributions still have all the tests and documentation for their modules and a package manager like Nimph can tell you which versions of the distribution pass the tests in your project and it can tell you whether a future version of a distribution module will break your code.

A distribution manager like gitnim can coordinate switching distribution versions without any duplication of data, synchronizing your distribution version to that of the compiler or providing concise logs on what has changed and where.

That should be enough to prompt discussion... :wink:

haxscramper commented 3 years ago

@disruptek to re-iterate on some points that are not entirely clear to me (maybe I just misunderstood something, but I decided to clarify everything anyway), how following scenarios should be handled (assume "is that correct?" ending for each point):

  1. New user installs nim - they just download (via choosenim/gitnim/cloning repo directly etc.) whatever is in current 1.4.2 branch of the 'official distribution' and are set. All of stdlib comes as git submodule.

  2. Core dev contributes to stdlib - stdlib is maintained as a separate project (multiple projects?), not included in the compiler source code and operates as any regular package. Just fork, write code, run test suite etc.

  3. Stdlib module is deprecated - it is not included in next release for main nim dist. If anyone needs for backwards compatibility, they just adjust requires.

  4. Someone wants to add thing to stdlib - RFC-fork-implement-PR in stdlib repo.

disruptek commented 3 years ago

Yes, but there's no reason to deprecate a module in the standard library unless we want to move it into a package where it can evolve. Shadowing the stdlib with an external package is probably the lesser of two evils.

This concept doesn't demand that we break anything and there are no compiler release stipulations or timeline expectations; remember that all the packages in the distribution currently assume the standard library exists as it always has.

If you really want to completely divorce the standard library, including all existing modules, we might be able to simply replace the stdlib with the distribution directory, but it doesn't seem to buy us very much.

haxscramper commented 3 years ago

Given that whatever I wrote above represents my current understanding I second @disruptek 's proposal for stdlib evolution.

Already mentioned os2 and strutils2 are just another form of shadowing (notable more confusing one), so I guess no changes here actually.

disruptek commented 3 years ago

I keep forgetting to mention this, but because dist doesn't technically distribute the contents, it doesn't add any licensing risk. There's no need to limit the packages to MIT code with rights held by @Araq or the Nim organization, whatever that may mean.

al6x commented 3 years ago

Fusion contains Nim modules that are to be bundled with the Nim installation in order to give us something like the "Nim distribution".

Hmm, maybe I don't understand why it's needed, I don't like the idea. I don't need the distribution with arbitrary choices like html parser included/not-included, choices that are totally irrelevant to my use cases and needs.

And after a while those "chosen one" libraries in the distribution will be probably outdated - and better, fresher, alternatives will be available as a separate third-party modules.

I would prefer 1) small and clean Nim core 2) minimal std libraries included 3) simple ways to install separate libraries.

As for the evolution of std - I'm willing to accept reasonable amount of breaking changes once say every 6 months if they make std and Nim better.

Araq commented 3 years ago

I would prefer 1) small and clean Nim core 2) minimal std libraries included 3) simple ways to install separate libraries.

Sure, but I don't. And plenty of others don't either, dependencies from multiple different sources have their own downsides. It's also not the topic of the RFC.

juancarlospaco commented 3 years ago

So we can see theres people that needs a batteries-included stdlib, and people that just needs the compiler, like two different POV from different people doing different things, almost like 2 layers ...you already know where this is going ;P

Araq commented 3 years ago

If a stdlib module doesn't suit you, don't import it. Simple as that. No need for layers. There are also plenty of tests in Nim's repository that don't affect you, should we remove them too just so that we can indulge in some mad quest for minimalism?

c-blake commented 3 years ago

Another possibility for life cycle management from non-existent to experimental to supported to deprecated to non-existent again would be a deprecated-like warning that you are using new, unstable APIs. (Of course, not everything would go through a full life-cycle.) preliminary or beta or nascent or any number of other words could work besides experimental. It could be on a per symbol basis, and like the deprecated string that gives an explanation, there could be a string potentially explaining what might change or maybe what it intends to eventually replace/supplant. Not sure this is best or better than it might be annoying, but seemed worth mentioning. { I don't think @disruptek's proposal really covered this aspect of the concerns. }

saem commented 3 years ago

Until I got to @disruptek's proposal I was thinking fusion would split into latest and stable packages. Test the former and latter against the current devel and 1.2 or whatever with stable.

I was also going to suggest a concept of collections to help manage the subdivisions such as those seen in JS. A collection would be a set of modules that follow rules above and beyond the core fusion rules. Any module or library claiming to be part of a collection must meet those criteria. To borrow the JS example, one can have core js collection, a dependent on core would be DOM, followed by browser. One could also have a node module depending upon core. Finally one could create a collection with election/nwjs/etc modules. The standard library already covers some of the JS stuff that I mentioned, I'm just using it for illustrative purposes. The hope is that this reduces the number of rules fusion needs to have as a baseline, but can then have tighter guarantees on subsets. People also get some gentle guardrails to ensure they aren't too quick in pulling in one more dependency and reducing applicability. The implementation can start of as informal until a few collections and their rules become more clear, then those policies can be automated/tested.

With all that said, dist would be a big step forward and doesn't preclude anything.

haxscramper commented 3 years ago

Actually dist addresses the main problem that caused fusion in the first place - people want to add their own functionality to the core stdlib/fusion, to improve "batteries included" situation, but it is not realistically possible to include everything in the "official" distribution.

At the same time std/std2, just creates another layer "batteries", without solving the actual problem at hand (at least not it's core) - at least that's how I see it.

arnetheduck commented 3 years ago

Package managers exist to solve this problem: You get a staging area, distro and all the other features. Critically, expectations are also managed for when packages can be deprecated: when nobody that imports them wants to pay the price for maintaining them. Core maintainers feel like maintaining batteries that go with a particular Nim version that are upgraded before releasing nim? Just create a batteries package that imports specific versions of stuff and reexports them transitively, then do the same work as one would if the package was in the standard library. The community that wants batteries gets batteries, the rest of us get a standard library that's feasible to both upgrade and rely upon - we're now also free to disconnect ourselves from the Nim standard library release cadence where it matters for us - if we need a bugfix to a specific batteries package, it's easy.

@disruptek 's dist is a fine idea for smaller projects but doesn't scale because with growth, they soon transitively end up needing different versions of the same dependency (which is fine). Package managers usually solve this problem as well.

For example, we have two protocols that use nim-libp2p (a p2p networking library) - if we create a third project that wants to use both protocols, we (indirectly) end up running two different versions of libp2p - this is entirely normal - having a hard requirement that both protocols use the same libp2p version creates unnecessary churn and maintenance overhead - in particular, the two protocols might be developed by different people / teams / paces etc. This is true of the standard library as well: as soon as there's a 2 variant, some code might use it and some might not.

This is generally solved by treating versions as separate packages - once you start doing this, life becomes much easier - upgrades can be isolated to where they are needed, instead of upgrading a giant monolith and risking the introduction of new bugs just because you need a small feature. This in particular is what keeps status from upgrading nim for example: we get all the new bugs for things we don't want to upgrade together with the things that we actually can use.

Nim in particular is hard to keep compatible due to the multitude of unfinished features which means that even "bugfixes" change core semantics of the feature, but also due to how the module system pollutes the global namespace: generally, adding any public symbol is a breaking change.

Ruling out package managers and keeping the above in mind, a poor-man's way for the standard library to evolve is to add versions of the same module - bitops2 replaces bitops etc - this doesn't solve anything a working package manager wouldn't solve better, but at least it provides an upgrade path for dumping more cruft into the standard library with less disruption for existing code - in particular, the older versions should never ever have to be touched and code that wants to use the new features can gradually migrate to the new version - notably, this is exactly the same that would happen in a package manager centric world (the developer performs a package upgrade and does whatever maintenance is needed to make their code work with the new version of the dependency). Whether that new version is called std2/bitops or std/bitops2 is largely cosmetic. We have several such packages in nim-stew for example that were written because the versions in the standard library were of poor quality and couldn't be fixed in an backwards-compatible way. It's also clear which version to use for new code and there's no giant std -> std2 migration looming.

Another set of conflicting goals is to lower the barrier of entry for random people to add packages and at the same time maintain stability, compatibility and room for growth. Every time a new package is added, a new blocker for upgrading everything else is also added - people create packages at difference cadences, upgrade them and maintain them based on their own needs, not based on the needs of the batteries or core maintainer.

Once there's enough people owning a batteries package that they're unable to share a pizza, they'll also be unable to efficiently produce a compatible version unless it's acceptable to remove or break things.

Just because something is maintained outside of the standard library doesn't mean it cannot have stability guarantees - https://tokio.rs/blog/2020-12-tokio-1-0 is a good example of how a critical library can provide their own set of guarantees and evolution path, even for "important" features - good code survives regardless where it lives - the standard libarary and related stability guarantees are unique in that they promise to keep the bad code and bugs around as well.

juancarlospaco commented 3 years ago

In the end, the reality is moving files from one location to another wont fix bugs, does not matter if the new location is a new folder, module, layer, system, whatever, if someone wants a better Nim fork it and send fixes, at least documentation fixes. :)

holgerschurig commented 3 years ago

My proposal is: simply get rid of fusion and std at all. And don't introduce std2.

Compare the situation with other programming languages that have modules. Like for example Python. There is no "std.re" module, it's simply "re". There is simply no need for "std.re" or "std." in the first place. "re" alone is all you need.

The "re" module can be shipped together with the python executable. This will make it a standard module, not the module path or the name. If someone uses pip or virtualenv to a newer version of "re". then the locally installed takes precendence. In the case of virtualenv it's even local to this one application sent into the virtual environment.

So of course the "re" shipped will use the features of the shipped binary. And of course a side-installation of "re" could either be an older one (e.g. to have compatibility with code), or a newer one (where the programmer makes sure by himself that it works with the older binary).

Having to rename "from experimental import re" into "import re" or "from std import re" is then simply not necessary. Also, needing to change "from std2 import re" to "from std import re" should also never be necessary.

saem commented 3 years ago

@disruptek 's dist is a fine idea for smaller projects but doesn't scale because with growth, they soon transitively end up needing different versions of the same dependency (which is fine). Package managers usually solve this problem as well.

Nim mostly does unity builds, having a difficult to resolve conflicting versions inside a single Nim project (Nim project source file + config) seems like something that's at a scale that most are not going to approach any time soon, maybe I lack imagination. I'm wondering if the criticism needs adjustment as it might be in the category of perfect is the enemy of good enough.

Nim has many tools that make the situation seem far more tractable than in other languages:

At that point the mostly syntactic but ultimately surface level incompatibilities that turn into big show stoppers in other languages are potential not in Nim. The rest are harder to resolve issues:

disruptek commented 3 years ago

The criticism is fair and the problem is real, but it's only made easier by dist, not harder. A package manager is at the whim of the compiler when it comes to stdlib but obviously shines when you give it packages to manage.

When I made Nimph last year it was originally going to handle diamond dependencies automatically, but I was eventually convinced that pushing users not to create these scenarios in the first place was the wiser move. The point is, Nim was capable of handling the situation then and according to a test @Clyybber did the other day, it still is.

Ruling out package managers

This is a pretty major predicate with which to defend the bitops2 model. Although there's a recent bug that demonstrates that Nimble doesn't have a working dependency resolver, I have hope that this is a temporary condition and not an assumption we should use as a design constraint, for a poor man's solution or otherwise.

This in particular is what keeps status from upgrading nim for example: we get all the new bugs for things we don't want to upgrade together with the things that we actually can use.

You don't have to upgrade Nim; that's a business decision that no one here is proposing to make for you. I honestly don't know why you would bring it up.

Nim in particular is hard to keep compatible due to the multitude of unfinished features which means that even "bugfixes" change core semantics of the feature, but also due to how the module system pollutes the global namespace: generally, adding any public symbol is a breaking change.

I agree, but this is far from a novel problem. Happily, there is again a solution in versioning. There are some plans to vet API changes automatically to help promote versioning as well, but it's largely a social problem as I'm sure you know.

Another set of conflicting goals is to lower the barrier of entry for random people to add packages and at the same time maintain stability, compatibility and room for growth.

I really don't think random people adding random packages reflect the reality of a standard library, be it one that ships with the compiler or one that is distributed via git. We already have the means to share code "randomly" via package managers, git, and floppy disks that you find in the IKEA parking lot.

Every time a new package is added, a new blocker for upgrading everything else is also added - people create packages at difference cadences, upgrade them and maintain them based on their own needs, not based on the needs of the batteries or core maintainer.

I think the point you're making is that a new package may be added that depends upon a version in dist, thus precluding dist from upgrading that dependency without breaking the new package.

The solution is obvious: fix the new package or drop it in a subsequent release. I hope that Nim is released more often in the future, but in any event, this is a social problem and one that dist helps to address by curating the included packages and fostering community engagement in their stewardship.

If I die of COVID tomorrow and @Araq wants bump to work with cligen-3.0, he can fork it, fix its requirements, put the fork in dist, and continue on his merry way.

treeform commented 3 years ago

I have never used or understood what fusion was. Maybe heard about it once or twice. I think fusion really has failed to market it self...

I fully support @holgerschurig proposal. Keep it simple. Don't do fusion, don't do std2. Just decide if a module is good enough to bring standard lib or not.

arnetheduck commented 3 years ago

pushing users not to create these scenarios

for small personal / toy projects, sure - but then you run into scenarios where you don't fully control transitive dependencies or managing them is costly and above all, unnecessary - if project a works with d1 and b works with d2, I don't really care if there are two different versions of d in there - I want a and b - from a technical point of view, it's simple, d1 and d2 are different packages and that's it - they just happen to share some code - it's not a diamond, it's a tree with similar leaves.

Nim was capable of handling the situation then and according to a test @Clyybber did the other day, it still is.

Which situation?

You don't have to upgrade Nim

No, we don't - but it flows both ways unfortunately - we can't contribute to Nim either because if we don't upgrade, we don't benefit from the fixes - the larger and more unwieldy Nim is, the harder this becomes.

largely a social problem

versioning is a social solution indeed to inform your downstreams what level of compatibility they should expect - but for managing transitive dependencies there exist technical solutions as well (namespacing) that allow two "versions" to be used concurrently without issues. bitops2 is a manual implementation of that solution, PM's automate it.

This is a pretty major predicate

well, it's part of the RFC, else I wouldn't bring it up, really.

fork it, fix its requirements

well, that's the point a bit: if you promise to keep things compatible, you assume responsibility for putting in the work, or your promise is empty - this tends to be the rub - it's hard to curate such sprawling collections of work which is why it often breaks down after a while - you're rarely an expert in each of the fields so either you have to coax upstreams to upgrade with you, do it yourself or indeed drop the package - if someone's willing to do that curation work, sure, it's a fine option that might suite some use cases, but there are several problems that we're having (in practise) which dist-like solutions don't solve (see https://github.com/status-im/nimbus-eth2/tree/stable/vendor - we're using a similar setup and hitting these walls) - a decent PM would provide a more targeted solution to all these issues (because then I get a personalized dist based on the needs of my project, not a hodge-podge of packages I might or might not want to use)

disruptek commented 3 years ago

for small personal / toy projects, sure - but then you run into scenarios where you don't fully control transitive dependencies or managing them is costly and above all, unnecessary - if project a works with d1 and b works with d2, I don't really care if there are two different versions of d in there - I want a and b - from a technical point of view, it's simple, d1 and d2 are different packages and that's it - they just happen to share some code - it's not a diamond, it's a tree with similar leaves.

Call it whatever you want; the point is that there's no technical obstacle to supporting it.

Please stop referring to projects as "small", "personal", or "toy". It's insulting. I promise you that the smallest personal project is no less important to the author than the largest codebases, and it's certainly much more important to Nim to win such customers than it is to satisfy a smaller quantity of entrenched users.

No, we don't - but it flows both ways unfortunately - we can't contribute to Nim either because if we don't upgrade, we don't benefit from the fixes - the larger and more unwieldy Nim is, the harder this becomes.

What contribution are you referring to? This RFC is about the stdlib evolution. As far as I can tell, recent contributions from Status to the stdlib have been minimal at best, and in fact I would call most of your work destructive in that it has forked attention and support in the ecosystem.

Cue @Araq coming in to tell me to shut up.

well, that's the point a bit: if you promise to keep things compatible, you assume responsibility for putting in the work, or your promise is empty - this tends to be the rub - it's hard to curate such sprawling collections of work which is why it often breaks down after a while - you're rarely an expert in each of the fields so either you have to coax upstreams to upgrade with you, do it yourself or indeed drop the package - if someone's willing to do that curation work, sure, it's a fine option that might suite some use cases,

Someone has to do the work to curate the stdlib now, in case you hadn't noticed. Those who have signed up to help on dist are folks who already have a fair number of packages and experience making everything work together. The situation can only improve from here.

a decent PM would provide a more targeted solution to all these issues (because then I get a personalized dist based on the needs of my project, not a hodge-podge of packages I might or might not want to use)

I went looking for the 5,000 line Nimble PR to add lockfiles the other day and couldn't find it. Do you have any contributions to announce on that front or is your message merely that dist does not serve your particular and effectively unique needs?

Araq commented 3 years ago

Ok, thanks for all the input so far. The RFC is rejected and we will accept better stdlib modules under new names so that nobody notices the stdlib is living software where some kind of versioning/evolution is going on.

arnetheduck commented 3 years ago

It's insulting.

None intended, sorry if it came across that way. Some issues become apparent only once a project hits a certain size though - or rather, once enough projects / packages exist that you want to start mixing and matching them more freely - whatever solution is chosen for the standard library would do well to take this dynamic into consideration.

minimal at best

indeed, this is the situation I'm trying to highlight: we cannot practically contribute to the standard library because we cannot a) wait 6 months for the fix to be released and b) find it more and more difficult to upgrade due to the number of unrelated changes inadvertedly affecting our codebase - a smaller standard library and a more dynamic approach to packaging would help, as would the separate namespacing strategy overall so that we can upgrade in a more granular way - the question the RFC poses is whether this should be done as a grand std2 rewrite or a more nuanced bitops2 approach - the latter is preferable for all kinds of reasons, but it would be nice to go a bit deeper and look at the underlying causes and what could help foster a more active and dynamic community.

haxscramper commented 3 years ago

@Araq

"Fusion specific proposal" - in the main RFC message you mentioned that fusion no longer will be shipped with stdlib, effectively becoming "official package for peoples' miscelaneous helpers" - will this be implemented, or fusion will stay as it currently is?

Araq commented 3 years ago

I don't know. Either we leave things as they are now or we move some essential Fusion libs back into the stdlib. Even Rust and C++ have atomic refcounting in their stdlibs.

haxscramper commented 3 years ago

@Araq

I think fusion is comparatively adequate idea in general. I think it mainly feels failed because:

  1. For extended period of time it had no particular contributions
  2. It wasn't advertised enough, nor does it have good documentation regarding development process & expectations.

More on #2 - I only now found that nim nightly is packages via ./koch fusion which uses hardcoded commit to a particular version, specifically commit dated back to july 2020. I'm personally fine with just making PR to fix this whenever fusion has new package merged, but this should at least be documented.

The fact fusion is now bundled with default nim installation is not documented anywhere, and I originally found it out when someone else linked PR to me.


In the end I don't think it is necessary to deprecate it, or come up with different solutions, but having a little more attention directed to it would not hurt.


PS: if my understanding of how fusion is packaged is correct I can make PR with fix, and update documentation. Whatever I wrote above is just a specific example that in my opinion illustrates quite well why it feels failed in particular.

mratsim commented 3 years ago

I feel like a broken record but I am convinced that to limit friction the standard library should:

This decouples the interface from the implementation.

Then I'd like some way to evolve the std implementations. Because most of it was implemented when nim was very young and they should be reevaluated, but the first step is agree on the interface and then the implementation can be rediscussed.

Case in point, bitops has problematic implementation on countLeadingZeros that requires me to reimplement it for every single project I build, in particular adding a BigInt library to Nim will require to fork bitops https://github.com/nim-lang/Nim/issues/14696 endians on the other end has a problematic interface.

Araq commented 3 years ago

@mratsim I am listening but these things take time and Fusion doesn't touch any of these things. Also, these "interfaces" can be harder to design and program than the actual implementation. Also, the assumption that you can switch between these different implementations is naive. For example, even though our tree-based Table implementation has the same API as the standard Table, we cannot change the standard table implementation easily -- the tree based impl cannot be put into a const section.

Araq commented 3 years ago

Case in point, bitops has problematic implementation on countLeadingZeros that requires me to reimplement it for every single project I build, in particular adding a BigInt library to Nim will require to fork bitops nim-lang/Nim#14696 endians on the other end has a problematic interface.

We can also fix these without having a grandiose design for async.

mratsim commented 3 years ago

Case in point, bitops has problematic implementation on countLeadingZeros that requires me to reimplement it for every single project I build, in particular adding a BigInt library to Nim will require to fork bitops nim-lang/Nim#14696 endians on the other end has a problematic interface.

We can also fix these without having a grandiose design for async.

We can fix them, but we need to agree on ways to evolve the standard library.

Also, these "interfaces" can be harder to design and program than the actual implementation. Also, the assumption that you can switch between these different implementations is naive. For example, even though our tree-based Table implementation has the same API as the standard Table, we cannot change the standard table implementation easily -- the tree based impl cannot be put into a const section.

Yes that's true in particular for complex things like streams or serialization, hence why we need guidelines on how to evolve the standard library. But for both tables or even more so for say channels or iterables, there are a restricted number of ways to use them. Whether an implementation can work at compile-time or not is not a problem as long as we can choose one for our use-case.

For example, one channel might not be suitable for use-case A because it requires multiple producers, well fine, there is another implementation that provides this. Use-case B wants single producer single consumer and highest perf possible because it's audio, perfect there is one channel for that. Regarding async, use-case A is for a kernel-mode driver for wifi or bluetooth, and requires no GC, perfect there is a scheduler for that. Use-case B requires OS timers and doesn't care about heap alloc, good there is a scheduler for that as well.

disruptek commented 3 years ago

It's insulting.

None intended, sorry if it came across that way. Some issues become apparent only once a project hits a certain size though - or rather, once enough projects / packages exist that you want to start mixing and matching them more freely - whatever solution is chosen for the standard library would do well to take this dynamic into consideration.

Fair enough; I shouldn't have let it raise my hackles. Damned hackles.

The problem of code reuse is, in my opinion, mostly due to the complexity exposed via configuration and package management. A smaller project with many dependencies effectively has the same problems as a larger project with many dependencies. But this is the problem that dist is designed to solve. If there's a flaw in the design, let's talk about fixing it.

indeed, this is the situation I'm trying to highlight: we cannot practically contribute to the standard library because we cannot a) wait 6 months for the fix to be released and b) find it more and more difficult to upgrade due to the number of unrelated changes inadvertedly affecting our codebase - a smaller standard library and a more dynamic approach to packaging would help, as would the separate namespacing strategy overall so that we can upgrade in a more granular way - the question the RFC poses is whether this should be done as a grand std2 rewrite or a more nuanced bitops2 approach - the latter is preferable for all kinds of reasons, but it would be nice to go a bit deeper and look at the underlying causes and what could help foster a more active and dynamic community.

Good point. Are you willing to point at a stable git reference for Status packages in dist from time to time? This would help foster cohesion and it solves the problem of deprecating stale stdlib packages in favor of superior Status replacements.

juancarlospaco commented 3 years ago

I do not understand the concrete roadmap of this, is it going to be archived?, is it going to be moved to stdlib?, deleted?, something else?.

arnetheduck commented 3 years ago

If there's a flaw in the design, let's talk about fixing it.

I don't know that it's a flaw so much as a difference in requirements as there are obviously projects that have different priorities and issues than ours - the principal difficulty we're facing is that we want to upgrade things independently and in smaller units, and not always to the same version across transitive dependencies - coming back to the "minimal supported version" strategy, it is more or less what we pursue at certain points of our roadmap: we don't necessarily always want the latest version of everything, but rather upgrade specific components on a case-by-case basis.

For example, in our applications, reading JSON files right now is a fringe operation - it happens sometimes, but unless there's a security bug, it doesn't really matter if an optimization makes it 5% faster - the risk and churn of upgrading the JSON component doesn't motivate an upgrade, but does represents a cost: we must now audit the new code for security issues (as we've done with the rest of the codebase, every dependency included).

Conversely, we might upgrade something very quickly when a bug is found that affects our users, specially if it involves remote exploits and this is, above all, the time that we don't want to be receiving unrelated changes.

Ergo, by and large, having a JSON parser in the standard library is mostly a cost, but almost no benefit for us - a standalone library serves our use case a lot better. This difference in pace and priorities repeats across many of the libraries we work with, though it's not always titled in the conservative direction, of course.

A dist solution conflicts with this way of operating: when we bump it, we get a large set of changes that we're not interested in, and we cannot bump individual parts of it (easily). The second problem is that two parts of our projects might rely on different versions of dist - this is a problem that we're facing with our own dist-style approach already even though we have full control over the downstreams - if we were using a common, shared community dist, we'd also be responsible for considering those projects when proposing an upgrade, and their needs might conflict with ours - they might need the 5% json parsing upgrade desperately, and we wouldn't want to stand in the way for that.

As much as dist might be appreciated by some projects, it's not a viable long-term strategy for us, most likely, unless it follows our release schedule (which is set by our stakeholders, the larger ethereum community) for major changes (we do after all, at the right time, want to play with the new toys also) and has the option for individual overrides so that we quickly can roll out security fixes. At that point though, we've pretty much arrived at the package manager feature set.

disruptek commented 3 years ago

If there's a flaw in the design, let's talk about fixing it.

I don't know that it's a flaw so much as a difference in requirements as there are obviously projects that have different priorities and issues than ours - the principal difficulty we're facing is that we want to upgrade things independently and in smaller units, and not always to the same version across transitive dependencies - coming back to the "minimal supported version" strategy, it is more or less what we pursue at certain points of our roadmap: we don't necessarily always want the latest version of everything, but rather upgrade specific components on a case-by-case basis.

Again, and I can't stress this enough, I don't care about you.

We've already established numerous times that Status has a use-case that is dissimilar from the rest of the community. You guys don't even use our package managers, as far as I know. You are going to set your requirements on your own terms, and that is appropriate.

What you do is your problem, and I've long given up expecting much contribution from Status to the community at large.

That said, the idea here is that by putting modules into a monorepo, we benefit from more users, more eyes, more development. The community can provide more value to Status, and vice-versa, without pissing in your precious walled garden. If you still don't see the value proposition, I urge you not to participate. I'm sure someone else will take the role on your behalf, which would be ideal.

For example, in our applications, reading JSON files right now is a fringe operation - it happens sometimes, but unless there's a security bug, it doesn't really matter if an optimization makes it 5% faster - the risk and churn of upgrading the JSON component doesn't motivate an upgrade, but does represents a cost: we must now audit the new code for security issues (as we've done with the rest of the codebase, every dependency included).

Sounds like you'd be happiest if the standard library didn't exist and everyone just used your code so security bugs could be found and fixed at a faster pace. That is exactly the goal of dist.

Conversely, we might upgrade something very quickly when a bug is found that affects our users, specially if it involves remote exploits and this is, above all, the time that we don't want to be receiving unrelated changes.

Tell me, do you find more bugs with fewer users or fewer bugs with more users?

Ergo, by and large, having a JSON parser in the standard library is mostly a cost, but almost no benefit for us - a standalone library serves our use case a lot better. This difference in pace and priorities repeats across many of the libraries we work with, though it's not always titled in the conservative direction, of course.

So you support moving packages from the standard library to, literally anywhere else, as that reduces your costs. Got it. Sounds like dist to me.

A dist solution conflicts with this way of operating: when we bump it, we get a large set of changes that we're not interested in, and we cannot bump individual parts of it (easily). The second problem is that two parts of our projects might rely on different versions of dist - this is a problem that we're facing with our own dist-style approach already even though we have full control over the downstreams - if we were using a common, shared community dist, we'd also be responsible for considering those projects when proposing an upgrade, and their needs might conflict with ours - they might need the 5% json parsing upgrade desperately, and we wouldn't want to stand in the way for that.

You. :clap: Don't. :clap: Use. :clap: Dist. :clap:

Your participation would be one of marketing, usage growth, exposure, et cetera -- and ensuring that the rest of the community doesn't do something stupid that causes unrelated software to develop dependencies that later come back to prevent co-use with your golden goose json serializer.

Unacceptable parts censored.