Open H4ad opened 10 months ago
What would we display on npmjs.org for a package?
The NPM can go to https://github.com/chalk/chalk#readme and fetch the readme directly when the repository URL is a host that NPM trust, like Github for example.
But this is just an initial idea, I want to raise this discussion so we could see if there is a good solution for this problem, or see if this is not a problem at all.
In my perspective, the developers don't need to have the README.md
on the bundle, for other developers it could be a feature.
If the only reason of README.md
is bundled is just to show a nice page for the package, so maybe we could introduce another API (or behavior) that provides the same feature without increasing the bundle size.
Without some sort of metadata that allows a package to be traced exactly back to what generated it, I don't know how the website would be able to fetch the right version. Other websites that use the registry (like jsdelivr, yarnpkg, etc) all use the README info within the package to work, and would too need to fetch this information on the fly and cache it themselves. (Of course, that level of tracing wound be nice, and some tooling adds a gitHead prop, but that's not universal and may not be accurate when produced automatically depending on the situation.)
I personally would like to see some sort of info that says that the bandwidth used by README files is significant enough to warrant the change; it would seem to me that READMEs are small fish compared to the traffic generated by large packages or packages with large dependency graphs.
Honestly long run I would prefer we move to the readme being updatable separate from the tarball upload (basically what @H4ad said in the follow up comment). It would be a larger long term change for the registry, so not sure if it is a topic for an RFC or just something to ask from npm folks. I believe we had talked about that idea before as well but I don't immediately see an issue/discussion to that point.
The readme in the tarball is not a significant source of extra bandwidth to be honest.
The readme in the packument is. That's a separate issue and we have internal discussions about how to stop needing to send that / store it.
I did not mean that it being in the tarball was a bandwidth issue. I meant more that we should decouple the two things. Readme in the tarball or not, we should enable the readme on the website to be updated without publishing a new version. That would then enable projects to decide if bundling the readme made sense for their project without npm doing any change.
@jakebailey If you know a way to get the top downloaded packages, I can try to get this statistic.
But by looking just for two packages that we know are heavily downloaded (chalk and commander), it can save ~7,7TB/month, so I think the number is considerable, but the magnitude of significance can be different from person to person, for me TB is something to worry, for NPM could not.
The readme in the tarball is not a significant source of extra bandwidth, to be honest.
I disagree in parts, for commander.js
the README
alone is 12,7kB of 45.8kB (gzip), but I can be biased so I think we should get more data on this.
If the readme is 12.7k compressed that suggests a way-too-large readme that could be split into separate files, rather than “readmes shouldn’t be in the tarball”.
Honestly I like having docs in readme's for small enough scoped libraries. So I am not a huge fan of that proposed solution as a reason not to try and make things better. Although I do agree that is the right approach for now.
If we can get movement on the registry api then I think it is reasonable to ask for an endpoint just for posting a new readme as a metata bit attached to the package not from the tarball itself. This solves more than one problem, it allows for readme updates without a publish (a long time ask) and also enables reducing the tarball size if packages do have large docs written in the readme.
The solution to solve most of the module size issues is to have two published bundles:
With a command, npm could “augment” the given module with the missing files (in case somebody needs them locally). Users that needs offline access could also turn it on by default.
(A popular argument is that github releases provides that source bundle. This is unfortunately not correct, because a published module cannot be deleted, while a github repo can).
Yeah, we want distributions! This would help so much with other issues as well. That would be a great solution to this problem.
Maybe we can integrate with https://github.com/npm/rfcs/pull/519 and add a new type of distribution that is only used to address the source bundle
, and this kind of distribution can be used to render the content on NPM page.
That sounds like a great idea!
That’s basically what I’ve been asking npm for for the better part of a decade :-) i would happily remove tests and a bunch of other dev-time files from the “runtime” package if i could set an npm config value and always have the “full’ package locally installed (for 450+ packages).
same here @ljharb
Motivation ("The Why")
I found two packages that ship Readme.md,
chalk
andcommander.js
.If we stop shipping Readme.md, we can save
631GB
and1.3TB
per week of bandwidth for these two packages.If we consider all the packages in the NPM ecosystem, I think the bandwidth that can be saved is huge.
Example
I created two PRs to propose a lighter version instead of shipping the heaviest Readme.md:
How
But I think instead of trying to fix each package, maybe we can change the behavior of NPM itself.
Current Behaviour
Today, we always include README.md.
Desired Behaviour
Not have to the
README.md
on the bundle.Possible solutions
We can remove
Readme.md
when the repository URL is defined and ifReadme.md
is not included in the files. To present the content of theNPM
page, when it can be fetched from the repository URL.Only when the user explicitly defines
Readme.md
infiles
or when the repository URL is not defined then we will keepReadme.md
.All this behavior must happen in the
prepack
step because we should not modify the package after it is generated, as there are many packages that send not only the.tgz
but also the hash to validate the integrity, so there is no way forNPM
modify package contents after publish.