microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
https://metascraper.js.org
MIT License
2.35k stars 168 forks source link

Add `metascraper/core` package #472

Closed mikkmartin closed 3 years ago

mikkmartin commented 3 years ago

Great package, some constructive critique.

Subject of the issue

Current package/import logic has some DX downsides: Really labor intensive to import all 9 packages (eg. prototyping / testing out the metascraper). Especcially using online editors like codesandbox where It's common to import one-by-one from an UI field.

Expected behaviour

Have you considered importing all core packages

const metascraper = require('metascraper/all')

or importing all with

const metascraper = require('metascraper')

like with the massive d3 packages

//all
import * as d3 from "d3";

//individual
import {scaleLinear} from "d3-scale";

Actual behaviour

const metascraper = require('metascraper')([
  require('metascraper-author')(),
  require('metascraper-date')(),
  require('metascraper-description')(),
  require('metascraper-image')(),
  require('metascraper-logo')(),
  require('metascraper-clearbit')(),
  require('metascraper-publisher')(),
  require('metascraper-title')(),
  require('metascraper-url')()
])
bouraine commented 3 years ago

It would help also if metascraper is used inside a cloud function. It should reduce the startup time if we have only one package to download instead of 10. updating package would be more easier as well.

Kikobeats commented 3 years ago

Thanks for the feedback. It's true the API could be more affordable for the user, but it's explicitly for a reason.

If people do metascraper/all for sure they are not going to know what is under the hood. An explicit interface is revealing more about the project, and also suggests you there are more other packages that you can explore for your use case.

Also, your suggestion has an assumption: what is the core? that could be different based on the user case (e.g., metascraper-clearbit is not considered core since it's an external service)

Also, note each package has its own semver version, so you only update what changed, not the whole.

bouraine commented 3 years ago

The two ways of doing can coexist: We can imagine a package with the essentials. (url, title, description, image, publisher) if we estimate that 90% of use cases include those.

but if someone has a different setup, he can continue to use individual packages.

mikkmartin commented 3 years ago

Agreed. Including vendor specifics like metascraper-clearbit in the main export would be an overkill for most people. Makes sense to me that "all" packages are "core" tags and vendor tags are manually included extensions / individual imports.

Ideally in my opinnion following Make It Work Make It Right Make It Fast. Making it work should be straight forward one package install, and making it right/fast – should be checking what's under the hood in readme.md IF you even need include/exclude something and use an explicit syntax.

Kikobeats commented 3 years ago

Closing since it's not a priority right now, will be revisited in the future