nice-registry / nice-package

📦 Clean up messy package metadata from the npm registry
35 stars 7 forks source link

Use `normalize-package-data` instead of `normalize-registry-metadata` #18

Open billiegoose opened 7 years ago

billiegoose commented 7 years ago

I don't know what's going on, but

nice-package uses normalize-package-data as a starter, then does even more package cleanup:

Is straight up not true. normalize-package-data isn't used anywhere. It's not a dependency of normalize-registry-metadata either.

(and I'm a little miffed atm because nice-package threw the author field into other, when normalize-package-data says it neatly parses the author field into an object with name, email, and url, and I had assumed based on the readme that nice-package was a superset of normalize-package-data)

zeke commented 7 years ago

normalize-package-data isn't used anywhere. It's not a dependency of normalize-registry-metadata

There's a backstory for this but I can't remember what it is. cc @soldair

nice-package threw the author field into other

That's because author is just what is found in the source package.json and sometimes doesn't reflect reality as packages change hands. owners is a better top-level property to look at, as it is the actual list of npm users who have publish rights to the package.

zeke commented 7 years ago

Found the backstory: https://github.com/nice-registry/nice-package/issues/6

i would stay with normalize-registry-metadata like you are now because its exactly the usecase it was designed for.

Sorry about the mismention in the readme.

zeke commented 7 years ago

Looking back into it, I wonder if maybe we should be using both.

billiegoose commented 7 years ago

I wrote in the readme that this package uses https://github.com/npm/normalize-package-data, but it actually uses https://github.com/npm/normalize-registry-metadata. @soldair what is the difference between these two packages?

I am literally laughing out loud, because I know exactly what that feels like. Conversation with my coworker, "Did I write that? Did you write that? I can't remember. Did we actually write it or just talk about writing it? Wait... are we even looking in the right repo? Well, when did we write it? That might give us a clue which repo..."

Gimme a moment to re-read soldair's response a few times to process it.

billiegoose commented 7 years ago

OK. So normalize-registry-metadata cleans the skimdb data to be (nearly) identical to what is served from registry.npmjs.com. And normalize-package-data is what npm uses when it reads package.json files.

Nice-package is in a tough spot, because it doesn't know where the input came from. Using normalize-registry-metadata in fetch-nice-package for instance, becomes superfluous because that data is coming from the registry.npmjs.com (not skimdb).

However, in package-stream, normalize-registry-metadata is the right thing to use, because it is getting its data from skimdb (not registry.npmjs.com).

I think all-the-packages is the same way because it gets its data from skimdb... although it uses a different endpoint rather than using a follower. I guess I really have no idea whether it needs normalize-registry-metadata or not.

I'm thinking you're correct, we might want to use both if the goal is ultimate niceness. Simply because normalize-package-data includes a laundry list of other fun normalizations. The downside is adding normalize-package-data to the mix might break existing code.

I'm almost tempted to say it might make more sense to start from scratch, make a skeleton of fields for package.json, and then fill it with data. I'm weird at analogies sometimes, but all these "normalizing" functions feel a whole lot like eslint applying a series of patches, and I'm wondering if we'd just get more consistent results if we did something like prettier and wrote a new package.json object directly from the raw data rather than trying to clean the raw data.

billiegoose commented 7 years ago

The main distinction would be how the code is organized. Currently the approach used by nice-package and normalize-package-data is

here's a list of things that could be wrong with the package data and here's how to fix them

The alternative approach would be organized around individual fields, like

the canonical field goes *here*, and here's a list of all the places to check for that data