Open Kikobeats opened 6 years ago
Would this replace the existing custom rules implementation or would it extend the implementation with a client side capability as well?
It will be use Custom Rules behind the hood
I'm assuming if we want multiple rules for the same rule name we just specify an array. (e.g. title: [ $('h1'), $('h2') ]
)
How to we specify the attr
and type
for rule?
I moved the future html
parameter out of the scope for now to stay focus in what we can build for today.
I updated the example with some features, any suggestion?
For specifying multiple rules (and extra settings), would we do the following?
// custom rules (see https://microlink.io/blog/custom-rules)
const client = microlink.extend({
rules: {
title: [
{
selector: 'h1 > .title',
attr: '<what do we put here for DOM Element content?>',
type: 'title'
},
{
selector: 'h1'
}
]
}
})
Add documentation section
Another example of client code used for merging more than one petition:
'use strict'
const { set, reduce, map } = require('lodash')
const { URL } = require('url')
const qsm = require('qsm')
const got = require('got')
const getMeta = async (url, { apiKey, ...opts } = {}) => {
const { origin: originUrl } = new URL(url)
const endpoint = apiKey
? 'https://pro.microlink.io'
: 'https://api.microlink.io'
const gotOpts = { json: true }
if (apiKey) set(gotOpts, 'headers.x-api-key', apiKey)
const res = await Promise.all([
got(qsm.add(endpoint, { url: originUrl, ...opts }), gotOpts),
got(qsm.add(endpoint, { url, ...opts }), gotOpts)
])
const data = map(res, 'body.data')
const meta = reduce(data, (acc, data) => ({ ...acc, ...data }), {})
return meta
}
getMeta(process.argv[2])
.then(meta => {
console.log('meta', meta)
process.exit()
})
.catch(err => {
console.error(err)
process.exit(1)
})
@charsleysa About how to specify multiple rules, what do you think about this proprosal:
const client = microlink.extend({
rules: {
title: {
selectors: [{ selector: 'h1', attr: 'text' }, { selector: 'h1 > .title', attr: 'text' }],
type: 'title',
default: 'Hello World'
}
}
})
@kikobeats I think that would work great!
It's similar to how we have structured a table in our DB storing custom rules (though in our system we transparently change the name if it clashes with an existing property as existing properties don't play nice with defaults).
Updated reflecting types already implemented
var isIP = require('net').isIP;
Interesting! builtin types 😄
Specification
Feature Name (Need to determinate)
Feature Headline
Turns any website into your API.
Features
Developer Experience
Batching support
via dataloader.
Caching Support
via got#cache.
Selector Declaration
Data Types
Metascraper types
audio
author
date
description
video
image
lang
logo
publisher
title
url
Native JS types
Others
ip
(is-ip).email
(email-regex / extract-email-address).html
(is-html).price
(parse-price, extract-price, parse-num, format-num).time
(extract-time).isbn
? (isbn10
andisbn13
).json
(crack-json).postcode
(postwoman).phone
.color
.enum
.range
.Also, add the ability to easily add new types (on client side):
Consider if it could be possible to load external dependencies, similar how deno imports from URLs:
Launch Day
Landing Page
Write a special section on the website to show the functionality.
Complementary or inside the section in the website write a little documentation about how to use it.
Recipes (https://microlink.io/recipes)
Write a series of recipes to show how to connect the functionality with a set of popular services for extracting specific content (
followers
,followings
,stars
, etc)tweets number
,following
,followers
,likes
,lists
,moments
. [1]repositories
,stars
,followers
,following
,gists
,contributions
.price
,stars
,opinions
.posts
,following
,followers
,likes
.3rd Party Apps
Bubble
Review how we can leverage the functionality with third party tools, like Bubble.
Inspiration