nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Developing Nim's stdlib and a Nim distribution #173

Closed Araq closed 4 years ago

Araq commented 4 years ago

Design guidelines for Nim's stdlib

We plan to create a "Nim distribution" which consists of the Nim compiler, Nimble and a selected/curated set of Nimble packages. The idea is to have the best of both worlds:

A Nim with a stdlib that can be maintained effectively by the Nim core developers and yet also something that has "batteries included". Nevertheless there still is a stdlib and it remains part of Nim's core.

Can the Nim compiler itself depend on a curated Nimble package? In the future yes, in the beginning, it shouldn't. We have to be conservative with Nim's core. This brings us to our first requirement:

(1) What the compiler itself needs must be part of the stdlib.

This is probably a temporary requirement until the "Nim distribution" has been implemented and tested successfully for a couple of months.

The second requirement should be uncontroversial:

(2) Vocabulary types must be part of the stdlib.

These are types most packages need to agree on for better interoperability, for example Option[T]. This rule also covers the existing collections like Table, CountTable etc. "Sorted" containers based on a tree-like data structure are still missing and should be added.

Time handling, especially the Time type are also covered by this rule.

(3) Existing, battle-tested modules stay

Reason: There is no benefit in moving them around just to fullfill some design fashion as in "Nim's core MUST BE SMALL". If you don't like an existing module, don't import it. If a compilation target (e.g. JS) cannot support a module, document this limitation.

This covers modules like os, osproc, strscans, strutils, strformat, etc.

And finally:

(4) New stdlib modules do not start as stdlib modules

Nim distribution

I imagine the "Nim distribution" to work like this: We have the usual Nim tarball with a dist/ directory that contains the set of selected packages we agreed on.

Adding a package

Every package in there must be voted into the distribution. The majority decides about whether to include the package or not.

A package must be at version 1 or later in order to be considered for inclusion. Ideally we can use the master branch of the github repository for inclusion.

After the decision to add it was made, a review process should start. The review should be done by the distribution maintainers.

The review process should focus on:

It should not focus on:

Removing a package

Ideally packages are not removed. It's a package others depend on. We should fork the package to ensure it stays online. It's acceptable if the development on a package has stopped. If the community decides that the package A has been superseded by a different package B the distribution can start to include B and deprecate A.

Keeping the packages up to date

CIs ensure the tests are green all the time. The distribution itself will be version controlled and the packages are tied to a specific git commit that has been reviewed. There is a tension here between "use what is known to be stable" and "use the latest" and probably we should support both, default is "stable", and "latest" not only means "latest" but also "unsupported and not reviewed".

Packages can be updated individually via some command like koch update xyz.

Benefits

Disadvantages

rayman22201 commented 4 years ago

I want to elaborate on why I think point 4 is very important

(4) New stdlib modules do not start as stdlib modules

The process for contributing to the standard library as an outside contributor is currently too difficult for some areas.

A high quality stdlib is important. The stdlib is relied on by a lot of people. A tough code review process is not a bad thing in principle, but in practice, the current situation has scared off some valuable contributors, and hindered Nim's development.

There are issues when there is not a clear concept of an "owner" of a std lib module that can properly moderate and make executive decisions.

Such an "owner" needs to have a reasonable domain knowledge, familiarity with the code base, and time to stay up to date with all code review comment threads.

Without a clear leader, making a PR can turn into a yak shaving exercise, trying to convince several strong personalities, which may never agree on fundamental issues. Leaving the PR to languish, and turning off potential contributors.

The other possibility, is that due to lack of manpower, a PR sits with no progress (not even a comment) for a long period of time. Even a close, wont accept would be welcome here. Any feedback at all.

These situations are not the same as Issues or Bug reports. These are people that have spent their valuable time writing code for the project. They deserve timely feedback.

This isn't theoretical. These scenarios have actually happened several times here in Nim land. Nobody want's their PR to sit in limbo for weeks.

If each module is it's own repo, the repo owner can make final decisions quickly. Making the process to contribute much easier.

The Nim core team is small. They don't have the man power or domain expertise to do this for a huge standard library with many modules.

This proposal allows the core team to focus on what they are good at, and spread some responsibility to the community. The core team can focus on being trusted curators, and leave the domain specific expertise to the domain experts.

This is exactly how Linux distros work. It has it's own challenges, but it's a fairly successful model IMO.

The community has already done a good job of filling in gaps in the standard library, creating better alternatives to some stdlib modules. This Proposal leverages what is already happening in the ecosystem, and allows everyone to benefit from it in a more formal way.

dom96 commented 4 years ago

Packages can be updated individually via some command like koch update xyz.

My god, please no.

A package must be at version 1 or later in order to be considered for inclusion.

How many existing packages fit this definition? What you'll end up doing with this proposal is encouraging people to artificially call their packages v1.0.0 to get the included in "the Nim distribution".

I propose alternative solutions to the problems that you say this proposal solves:

Newcomers do not have to spend time figuring out which packages are production ready.

This is what tags in Nimble are for, designate a tag and mark packages which are "production ready" with it. Then just give them a link: https://nimble.directory/search?query=nim-production-approved

Users behind a firewall that cannot easily use Nimble packages can use the packages we ship with the distribution.

Maybe there is more to this, but if users are behind a firewall then they are more often than not offered a proxy which Nimble supports by the way. I don't understand this problem at all. Maybe the problem is actually different? Are there companies which do not allow installation of packages and/or need pre-approval for each new component that is installed?

Some protection against the current problem that a dependency might disappear under your feet.

This is something that should be resolved with a proper, official and centralised, package website where users can publish their packages (i.e. upload them). You should pour resources into that instead of creating this distribution.

rayman22201 commented 4 years ago

Are there companies which do not allow installation of packages and/or need pre-approval for each new component that is installed?

Yes there are. Many companies in the financial sector work this way.

rayman22201 commented 4 years ago

This is what tags in Nimble are for, designate a tag and mark packages which are "production ready" with it. Then just give them a link: https://nimble.directory/search?query=nim-production-approved

That is not mutually exclusive to this proposal. This proposal has a huge extra benefit:

In theory, those set of packages and their dependencies have all been tested together, so that you know they won't interfere with each other.

It's the same way Linux distros work. You trust that if you apt-get some package that is part of Ubuntu stable it is going to be compatible with other software you have installed on your machine.

Correct me if I'm wrong about my assumptions here.

dom96 commented 4 years ago

In theory, those set of packages and their dependencies have all been tested together, so that you know they won't interfere with each other.

In what way could they possibly interfere with each other? What is the proposal for testing that they are compatible?

The only advantage this will have is perhaps compatibility with the Nim version that the packages are bundled with. Honestly though, we don't even have good test coverage for the stdlib, it should really be a priority to improve that instead of creating a distribution and supposedly testing it.

dom96 commented 4 years ago

Yes there are. Many companies in the financial sector work this way.

In that case I doubt a distribution where packages are voted on will solve this problem for them. There will always be packages that they wish to use for which they will need to gain approval.

Also, I assume that all of the packages in the distribution will need to be audited. Doing so for all packages that the community deems appropriate to include in the distribution will be a much larger burden than doing so for a select few packages that the financial institution requires for their software.

Araq commented 4 years ago

Here is what the distribution would avoid: You have modules A and B, A depends on C version 1, B depends on C version 2 (incompatible with version 1).

A list of Nimble packages doesn't achieve the same.

dom96 commented 4 years ago

So you'll remove any packages that have this incompatibility? That could make what's included in the distribution quite volatile.

The real solution to this problem is to implement support for it in the compiler. C v1 should be considered a separate package by the compiler to C v2 somehow.

Araq commented 4 years ago

So you'll remove any packages that have this incompatibility?

No, they won't be added in the first place, only one version of C can make it into the distribution.

The real solution to this problem is to implement support for it in the compiler. C v1 should be considered a separate package by the compiler to C v2 somehow.

I don't agree, it's not the compiler's job to enable a solution that is at best a momentary unfortunate situation and at worst caused by incompetence.

rayman22201 commented 4 years ago

In that case I doubt a distribution where packages are voted on will solve this problem for them. There will always be packages that they wish to use for which they will need to gain approval.

Also, I assume that all of the packages in the distribution will need to be audited. Doing so for all packages that the community deems appropriate to include in the distribution will be a much larger burden than doing so for a select few packages that the financial institution requires for their software.

All this is true. The company obviously must choose and audit the packages they require either way, but if the mechanisms to support such a "sealed" distribution are already in place, it would be a very attractive feature... Then they just have to pick the packages they want, instead of building the entire infrastructure. NPM has a whole business model based on this idea, with self hosted registries.

dom96 commented 4 years ago

So you'll remove any packages that have this incompatibility?

No, they won't be added in the first place, only one version of C can make it into the distribution.

Yes, but the problem is that dependencies change as packages are updated. You will find that either removing a package altogether or keeping them frozen in an old state is the only way to keep compatibility. That is what I meant by "volatility".

dom96 commented 4 years ago

The company obviously must choose and audit the packages they require either way, but if the mechanisms to support such a "sealed" distribution are already in place, it would be a very attractive feature... Then they just have to pick the packages they want, instead of building the entire infrastructure. NPM has a whole business model based on this idea, with self hosted registries.

Of course, but Araq isn't proposing an NPM-style model. He's proposing an official distribution which the community chooses. My point is that no financial institution will be happy with this because each will have different requirements.

We should work towards an NPM-style solution, with an ability to allow these institutions to create self-hosted registries. This is the sustainable way forward and may actually help Nim be financially sustainable (of course, NPM is on a whole different scale... but on the other hand they are hugely profitable AFAIK)

rayman22201 commented 4 years ago

Of course, but Araq isn't proposing an NPM-style model. He's proposing an official distribution which the community chooses.

Again, these are not mutually exclusive.

The community distribution is simply an "example" so to speak, that has certain high standards for compatibility. Again, this is exactly what Linux distros do. It is not a radical idea.

Araq commented 4 years ago

My point is that no financial institution will be happy with this because each will have different requirements.

How so? Previously they accepted nim-1.0.0.tar.xz as a trusted thing programmers are allowed to use, afterwards they do the same for nim-distribution.tar.xz which fullfills the same standards that nim-1.0.0.tar.xz did. The point is that it's easier to get permission for 1 software package as opposed to 10 different dependencies.

dom96 commented 4 years ago

Again, these are not mutually exclusive.

Yes, but they each take time, and Nim as a whole has limited resources. We should spending those resources wisely, on a solution that works long-term and for more use cases.

dom96 commented 4 years ago

How so?

Each financial institution will have a different set of packages that they want.

Previously they accepted nim-1.0.0.tar.xz as a trusted thing programmers are allowed to use, afterwards they do the same for nim-distribution.tar.xz which fullfills the same standards that nim-1.0.0.tar.xz did. The point is that it's easier to get permission for 1 software package as opposed to 10 different dependencies.

So as long as we put some packages together and call it a distribution the financial institution will just happily trust us? I find that hard to believe.

rayman22201 commented 4 years ago

Yes, but they each take time, and Nim as a whole has limited resources. We should spending those resources wisely, on a solution that works long-term and for more use cases.

I agree, but we disagree on where those resources should be spent. I think this will provide the most good long term.

See my initial post about contributing to the stdlib today.

Araq commented 4 years ago

We should spending those resources wisely, on a solution that works long-term and for more use cases.

Everything you suggested so far is more expensive than my proposal.

dom96 commented 4 years ago

Everything you suggested so far is more expensive than my proposal.

My suggestions also solve more problems and are effectively inevitable. You might as well put resources into them because you will have to do so eventually anyway.

I think this will provide the most good long term.

This is where we disagree. I strongly don't think this will provide much benefit to our users, even in the short-term, much less in the long-term.

Araq commented 4 years ago

No, they solve different problems and fail to see that there valid reasons behind the quite common stance, "I won't use X, it's a dependency".

andreaferretti commented 4 years ago

We should work towards an NPM-style solution, with an ability to allow these institutions to create self-hosted registries.

It is already super easy to create a self-hosted registry. Steps:

  1. Mirror the git repositories for packages you trust on a local git server (most big institutions already have a git server anyway)
  2. Copy packages.json, edit it to only keep trusted repositories and change the URLs to point at the local git server. Host this file somewhere, possibly on the same git server as above
  3. Use Nimble configuration to point to your local packages.json
[PackageList]
name = "CustomPackages"
url = "http://mydomain.org/packages.json"
dom96 commented 4 years ago

I agree completely with your assessment @andreaferretti. But the big factor here is "big institutions", these will have enough scale to maintain something like this. But if I'm a small startup I don't want to mess with this, I just want a hosted solution that works automagically for me.

Indeed, we should reuse this functionality for our custom NPM-like solution, or at least evaluate it to see its limitations.

dom96 commented 4 years ago

No, they solve different problems and fail to see that there valid reasons behind the quite common stance, "I won't use X, it's a dependency".

Just because something is part of the "Nim distribution" doesn't change the fact that the package is still a dependency...

Araq commented 4 years ago

But it does change that! Just like today's (rather silly IMO) stdlib's htmlparser is not an extra dependency the same would be true for everything in the Nim distribution. You don't need to review it for security problems, because somebody else did, it won't be removed all of a sudden, it's available out of the box with Nim...

kidandcat commented 4 years ago

I'm with @dom96 in the sense that the resources we have are limited, and we should focus in smaller steps.

For example, I would propose to keep important packages under the nim-lang org in github, and give maintainers collaboration access, this way, if a maintainer dissapears, other person can focus on maintain X package without needing to fork and let the old one over there.

If you don't want to make those packages look "officially supported", just create another org nim-community or similar, but you should have control over the packages, because one thing I see a lot is that, two packages for the same funcionality, one has 100 stars but last commit 3 years old, and another one updated yesterday but just with 15 stars.

But I would solve those problems @Araq has listed one by one instead of trying a big solution like this distribution.

mratsim commented 4 years ago

Many people don't want to do "nimble install foo" even if that foo is https://github.com/nim-lang/zip/. Putting popular packages under "nim-lang" wouldn't solve that.

They either have restrictions on their machines or firewalls (i.e. financial institutions and believe me, there might be tug of war between devs and either sysadmins or IT security), or they want the "genuine" default experience.

The solution is that once one package becomes popular, people propose to be co-maintainers. Possibly we could have say "nim-webdev", "nim-science", "nim-games" organizations if a domain becomes quite big but that should be happening organically. What the Nim community might do is maybe allow subforums in the Nim forum so that people have a privileged place to discuss that.

Araq commented 4 years ago

But I would solve those problems @Araq has listed one by one instead of trying a big solution like this distribution.

I do not consider it a "big solution", I consider it the "smallest solution that could possibly work". The only thing that bothers me is that it adds maintenance costs. However, every other solutions also adds maintenance costs of some sort, it's inevitable. And also hopefully people chime in to mitigate this cost.

c-blake commented 4 years ago

A little wrinkle that hasn't been included in this discussion is package-level documentation. One of my packages of interest, https://github.com/c-blake/cligen in its simplest usage style requires "definitely some" because it's a rare approach with a couple rare features, but "not very much" documentation because usage "scales down" so very far.

For a user already committed to learning some things this is not an issue. For "fly by" potential users, I've hoped that I could keep their attention for at least 4..6 paragraphs and 1 code snippet at the top of the README.md. This probably seems like I am arguing against the proposal which I am not, really. It's just a "property" of the proposal. The usual Nim distro has an integrated documentation system. Maybe some thought about how this interacts with that is warranted.

I am fine with cligen being included in some dist/ sub-directory, and am generally in favor of this idea. In general, I personally avoid dependencies enough/have enough sympathy with others doing the same that I just dumped a bunch of general utility code with a Unix/Linux bias into cligen/ that may make sense to lift out. Another issue that may impact other packages.

Another reason unmentioned so far (but perhaps vaguely related to @mratsim's genuine default experience) for avoiding dependencies is that "System" package managers like rpm, apt, portage, etc. and "Language" package managers like nimble, pip, etc. usually do not know about each other or each other's files. So, you get these incoherent set-ups/installs where everything is in someone's home dir or it's in some system dir, but the files are "orphaned" relative to the system package manager, etc. Sometimes things like the site-packages (pioneered by Emacs) lets you have a hybrid/half-and-half incoherency. E.g., pip and system package managers both use site-packages. So, pip list will see all the packages a system package manager put into some Python site-packages, yet pip install will not register its files with the system package manager. So then, tools that tell you what package owns some file break. Let's just say there is some rationality to "avoiding this whole mess" or sympathizing with those who do. Fewer dependencies is one way to make the mess at least smaller while "let a thousand packages bloom" makes the problem worse.

{ If someone has some other ideas to make that mess smaller, it may be a good topic for a related but distinct RFC and/or nim/nimble issue. My best idea is some kind of "generator" for the usually <12 system package managers from language package descriptions and engaging with the people who maintain system package repos. }

dom96 commented 4 years ago

Many people don't want to do "nimble install foo" even if that foo is https://github.com/nim-lang/zip/. Putting popular packages under "nim-lang" wouldn't solve that.

You make it sound like 80%+ of us don't want to install packages via Nimble, which I seriously doubt is the case. I don't doubt that there are people working for financial institutions who have these restrictions, but I want to hear from them, and I would ask you to stop exaggerating how many of these people exist.

They either have restrictions on their machines or firewalls (i.e. financial institutions and believe me, there might be tug of war between devs and either sysadmins or IT security), or they want the "genuine" default experience.

As I mentioned previously, whatever restrictions the users have with regards to firewalls can be worked around by proxies. If IT needs to sign off on every single package then I don't see a reason why they wouldn't need to sign off on every single package inside a Nim distribution.

A "genuine" default experience is nice, it works well for anaconda for example. But I seriously doubt we've got anywhere near enough mature packages that could be bundled up into a useful distribution.

c-blake commented 4 years ago

Just a vote, not a survey, but I hate having to install packages via nimble (or pip or any language package manager). If someone wrote a nimble2x where x included ebuild then I would use that to manage a private package repo, install out of that to system directories and hope someday the ebuild(s) could get into a portage tree maintained by others.

This would allow, among other things, packages not written in Nim to depend upon, say, a command line utility written in Nim, or programs written in Nim to depend upon things not written in Nim, such as certain versions of C libraries that could be auto-installed as dependencies via the system package manager request to install some Nim program.

andreaferretti commented 4 years ago

For a vote in the opposite direction, I am the kind of users who installs miniconda instead of anaconda, just to make sure not to rely on some big agglomerate of packages, but choose the exact dependencies I need instead. Now, if only Nimble supported lock files... ;-)

Araq commented 4 years ago

I don't see a reason why they wouldn't need to sign off on every single package inside a Nim distribution.

So ... do they have to sign off to use htmlparser? I don't think so.

Also this isn't a voting about "nimble is bad: yes/no", Assuming a perfect, flawless Nimble, every point I brought up remains the same.

kidandcat commented 4 years ago

I understands your point @Araq , but it is focused on a kind of Nim user (the one in the financial company). For me for example, if I use nim mainly for arduino development, in what helps me the Nim distribution? (I need to carely choose which packages I want to use)

I don't know the Nim user base, but at least for me the user with restricted access and authorization to download separated packages is very uncommon

Araq commented 4 years ago

@kidandcat don't hesitate to give your downvote then. ;-) In the worst case we can extract the guidelines about the stdlib and burry the distribution idea.

c-blake commented 4 years ago

@andreaferretti - what about your non-Nim dependencies (up or down)? Presumably you would also like those to be precise? Hopefully you're not a vote against nimble2x or maybe nimble convert if @dom96 wants to include it in some more perfect nimble. :-)

There are surely complexities relating to pure source/mixed source-binary kinds of package managers, though. Also, in Nim, (and Python, Perl, even old sh, ..) a module can be both a library and a command. This means there is also more than one "way to depend" on something..as a program to run or as a library to import (or both) and even 2 ways to install (command path or import path). Arcs in the dependency graph have "more than one color", if you will. Support for such bitonal (or multitonal/parameterized) dependencies is something missing from most system (and language?) package managers -- not a problem we can solve here. (E.g., Gentoo has a way to depend upon a package with only certain feature flags enabled and such flags could include "install xyz as a program/library/both", but most package systems principally care about version constraints.)

I agree with @Araq that it's not about nimble pro/cons exactly. I was just elaborating on my {mess management idea} (and do so again above) and responding to Dom who seemed to be taking the topic to be about nimble.

"Batteries included attracts users via better installation ergonomics" should be an uncontentious claim in this day and age. It becomes "about nimble" if you think "nimble is the only way to manage the whole ecosystem", but that seems a little close-minded. It is mostly a question of scale and only becomes contentious if there are "too many batteries to keep working right" for the manpower or the human aspects of the organization seem too hard to work. I think those should be the focus..@mratsim had some human activity organization ideas. We could maybe improve important packages test integration. Might make sense to look at how other projects of similar human-scale manage these things (or how they wish they did).

disruptek commented 4 years ago

:-1: from me:

I like @Araq's numbered points, which are all about the standard library, its role, and future growth of same.

The rest of his post says nearly nothing about what the perceived problem is or how the solution is going to solve it. Honestly, there is so little "there" there, that I don't even know how to begin to debate it. If you're going to specify a problem and its solution so poorly, you may as well just implement it. At least then we'd have something to read which might be less vague.

The technical challenges of the proposal (if I even understand it) don't seem insignificant to me, but worse, the concept seems to add to the very same problems that it cites as its origin -- fragmentation of the ecosystem and inaccessibility or interoperability of its components. This isn't going to speed up the development of Nim software; if anything, it's going to slow it down.

What's my incentive to engage my package in the distribution? So that people behind firewalls might have an easier time of using my software? Just give them a dynamic URL that points to a tarball of packages versioned to their specification. Sign it, CDN it, whatever. Problem solved.

Anything else? I can think of many disincentives.

I would rather more work were put into the ecosystem tools we already have. If they aren't going to be deprecated, who would disagree that improvements there benefit everyone? From this perspective, I share @dom96's view.

If we cannot make the current distribution methods work as intended or agree on what that intention is, that's a separate discussion. If we can, we should. I don't think I'm alone in the view that when it comes to packaging, neither Nim or Nimble are implemented (let alone specified) completely, nor are they unified in intention.

From the perspective of a distribution user, I share @c-blake's viewpoint. I'd prefer that there are fewer assumptions made about how packaging and distribution work best for the user, and I'd prefer that those mechanisms that do exist are made unassailably bulletproof and useful to everyone.

I do have some opinions on Nim package problems and ideas for solutions to same, but it seems like a waste to share them here.

andreaferretti commented 4 years ago

"Batteries included attracts users via better installation ergonomics" should be an uncontentious claim in this day and age

It is not. Node is extremely popular and has almost no batteries included. But it has good tools to manage dependencies. The JVM has very limited batteries included, yet it also very, very popular.

Not that I am against having batteries included - I just don't care much, as long as I have a dependable package manager. Unfortunately, Nimble is not there yet. Two big things missing are

what about your non-Nim dependencies (up or down)? Presumably you would also like those to be precise?

Well, ideally I would like to no have them, and only rely on Nim native components :-) If I have them, I think the only sane option is to build everything into a container (docker, vagga...). I would like to have a C/C++ dependency manager, but that ecosystem just does not work this way, it is deeply rooted in the 70s.

This means there is also more than one "way to depend" on something..as a program to run or as a library to import

Well, yes, This is a good reason to allow Nimble to depend from external plugins to perform particular tasks. For instance, sbt works this way: you can have dependencies for your application (library) or for your build file (program)

c-blake commented 4 years ago

Some fair points @andreaferretti. Container build rules often involve a system package manager, though. So, that seems to me not a great anti-nimble-rpm/deb/ebuild/etc. dependency integration argument. I don't have much personal experience with NPM/JVM other than that they seem mostly "self-contained pure-JS/pure-Java" kinds of environments. So, they probably don't inform nimble-system manager integration much.

mratsim commented 4 years ago

@dom96 One example: https://irclogs.nim-lang.org/30-03-2018.html#08:08:48

I mean, I don't like ripping code from other people or setting other people's code as a dependancy

Also I did work in financial institutions (4 years) on the ops side and basically the discussion with developers was "don't do that" or "disclaimer: all damages to the company due to this unapproved code will be supported by your department budget". And this was for unapproved unzip library on AS400.

What a financial institution is looking into before choosing a package is insurance in the form of a support package. They want to know that should they use Nim for core part, they can rely on a fixed subset of Nim + packages + locked versions that work together with the possibility to ring the provider any time for emergency fixes that may cost hundreds of thousands or a huge loss of reputation. They will gladly pay an annual fee for such insurance.

Obviously given Nim size, they will probably start with an internal team, but as the reliance on Nim grows, they will be moved to the core business / value addition of the company (the financial institution proprietary algorithms) and the financial institution will look into offloading dependency support to an external provider for various reasons:

Now I agree that a tool to "generate your own Nim distro" would be great, but I think it's best to start with a proof-of-concept, "Nim important packages" distro, get it out, see how people like it. This would be very easy for people to try nim in ix.io or in their own Docker with as less friction as possible, write non-trivial useful programs that needs more than the standard library (say npegs or SDL2 or bigint or crypto or Arraymancer).

Then we can write the "generate your own distro" layer so that if someone wants a "Nim distro for finance" or "Nim distro for science" and want to contract a company to maintain it.

Alternatively, instead of being by domain, those distributions could be by security properties, say code that has been audited, code that only use safe features of Nim, no warranty code, similar to Ada Spark.


Now, that said, assuming we only had one person with the choice of working on either nimble or the distribution the priority should be in making nimble better over the distribution aspect because I agree that the vast majority of users will happily use a package manager.

For scalable usage, nimble needs in my opinion:


However, from a time and resource perspective, it's not an either nimble or distribution.

@Araq works on Nim full-time, and probably can provide a distribution in less time that you or anyone who wants to tackle lock files as a project on the side of a full-time job. Furthermore while we could also ask @Araq to work on Nimble instead, I would argue that he should be the last one working on it as he doesn't use any packages, everything is in the standard library and so he doesn't have to deal daily with nimble limitations.

pmetras commented 4 years ago

The reason I support Araq proposal is because I prefer to have a small subset of high quality libraries instead of thousands of low quality packages and concentrate support efforts on the libraries in the distribution. When Nimble improves with features to identify valuable packages, when the community is larger, the initial goal of distributions disappears. Look at Debian, there are hundreds of thousands of DEB packages on the Web, but only 59,000 are included into the latest Buster release. When you stick with packages from the distribution, you have the insurance that they will work together without trouble. This type of insurance is important for companies and Nim beginners.

alehander92 commented 4 years ago

@pmetras but thats not how third party ecosystems work: popular ecosystems lead to people working on their own packages or even having choice between many quality packages for the same thing, it's a bit like free market vs a big state imo (i know this metaphor is overused): its hard to expect a very minimal team that already has huge amount of work on the language to somehow maintain a huge library suite as well.

I think the idea of distro is good in principle, but look at Go, C++, Python, Ruby, Java etc: how often do you see distributions except for niche cases like python science? my point is that having an active ecosystem is much more critical than having distributions, and that it seems its not a problem for many of those much more popular languages/ecosystems (correct me if i am wrong)

alehander92 commented 4 years ago

@pmetras sorry, i now realized you argue about something similar, and distributions mostly in the beginning, i agree with that in a way, but i still want to point out that making the ecosystem bigger is more important: not sure how a distribution applies tho, maybe it helps

dom96 commented 4 years ago

Honestly, our package ecosystem is so immature that you won't be able to create a stable distribution that's useful to anyone.

Never mind creating a distribution that's useful for a financial institution!

Furthermore while we could also ask @Araq to work on Nimble instead, I would argue that he should be the last one working on it as he doesn't use any packages, everything is in the standard library and so he doesn't have to deal daily with nimble limitations.

@mratsim I disagree, @Araq has worked on Nimble and should work more on it. The creator of Nim avoiding such a major aspect of the language doesn't do our users any favours.

disruptek commented 4 years ago

It's a long read, so I decided to write it only once; this is how I'm planning on doing a distribution generator: https://github.com/disruptek/nimph

It should fix my personal nimble pain points with more of an embrace and extend attitude so that a rising tide there will lift all boats. I know it won't meet a lot of the concerns here, but I did try to meet some. Worst-case scenario, I'll offer another failed research experiment. :wink:

Araq commented 4 years ago

Honestly, our package ecosystem is so immature that you won't be able to create a stable distribution that's useful to anyone.

What?! We have nim-regex that's better than our stdlib packages, better packages to do Pegs, better packages to do serialization, a couple of useful UI libraries, ORMs, ...

The creator of Nim avoiding such a major aspect of the language doesn't do our users any favours.

IMHO a better Nimble cannot solve the inherent fragility of a distributed system. But I've said it before, the points I raised are not solved by a perfect package manager.

kidandcat commented 4 years ago

You cannot argue that a decentralized solution is bad when do not exist a language that can be used without third party dependencies. It is easy, if you develop your language+pkg dependencies, you are, how many, 2 developers? 5-10 if you get a lot of help?

If you count the people that actually contributes to any Nim package, you have more than 100 people. I would bet every resource we had into making Nim community stronger, I just see this distribution like a specific feature for financial companies, not for the future of the language (and that scares me).

Araq commented 4 years ago

You cannot argue that a decentralized solution is bad when do not exist a language that can be used without third party dependencies.

Arguable. Python with its batteries included surely is/was useful without third party deps. Plenty of people use C++ or C without external dependencies, of course it depends on the application domains.

If you count the people that actually contributes to any Nim package, you have more than 100 people. I would bet every resource we had into making Nim community stronger, I just see this distribution like a specific feature for financial companies, not for the future of the language (and that scares me).

Fair enough I guess.

pmetras commented 4 years ago

I think there are multiple understandings of what "distribution" mean. I'll try to synthesize what I put in my understanding:

For instance, if we have a distribution about data science and machine learning, I expect to find libraries about dataframes, machine learning and statistical algorithms, graphing. For a data structures and algorithms distributions, I expect to have classical container data structures (trees, hashmaps, etc.) and algorithms (sorts, hash, etc.). Another one about languages could have parsers and lexers libraries. One can imagine a medical, education or scientific distributions.

I don't care if it's included into Nim umbrella or not, that it is centralized or distributed, in a container or not. I don't need to create personal distribution or that it is based on Nimble or not. What I want to ease development for beginners and attract some type of companies or governments, when Nim compiler v1.3 is published, I can get data science v1.3 and stdlib v1.3 distributions easily, for instance. There is no barrier against me to become efficient in my domain of interest immediately. I don't need to spend time finding packages with Nimble and debug them or write the documentation...

FedericoCeratto commented 4 years ago

Are there companies which do not allow installation of packages and/or need pre-approval for each new component that is installed?

Yes, many companies including "FAANGs" prohibit pip/nimble/npm/tarball installs but provide blanket approval for linux/bsd distributions for many reasons. Security: many installers do not check for vetting or signatures, also provide no backports of security fixes and do not perform system-wide security updates. OS installers do all of that. Reliable and reproducible deployments: those tools do not guarantee that all deployed systems use the same versions of libraries and applications company-wide. OSes do. Legal: some Linux distributions have a legal vetting process. Legal 2: various companies (including Canonical) provide legal indemnification against copyright breaches, and other provide insurance against intrusion and data loss, but only for popular distributions. Userbase: Popular distributions are also reviewed and vetted (and rebuilt) by various large companies both for internal and external use.

Edit: A summary around supply chain attacks and how to mitigate them: https://drewdevault.com/2022/05/12/Supply-chain-when-will-we-learn.html and previously https://arxiv.org/pdf/2005.09535.pdf

FedericoCeratto commented 4 years ago

Here is my suggestion: create periodical "snapshot" lists of compiler version + library names + library versions that are trusted, tested, known to work together and sign it. The snapshot itself is just a list of package names and versions. Such lists is then used: