yields / ant

A web crawler for Go
MIT License
276 stars 17 forks source link

Errors in the example for Built-in Scrapers #25

Closed peterhellberg closed 3 years ago

peterhellberg commented 3 years ago

The example under https://github.com/yields/ant#built-in-scrapers has a few minor errors in it.

A corrected version would be:

package main

import (
    "context"
    "os"

    "github.com/yields/ant"
)

func main() {
    // Describe how a quote should be extracted.
    type Quote struct {
        Text string   `css:".text"`
        By   string   `css:".author"`
        Tags []string `css:".tag"`
    }

    // A page may have many quotes.
    type Page struct {
        Quotes []Quote `css:".quote"`
    }

    // Where we want to fetch quotes from.
    const host = "quotes.toscrape.com"

    // Initialize the engine with a built-in scraper
    // that receives a type and extract data into an io.Writer.
    eng, err := ant.NewEngine(ant.EngineConfig{
        Scraper: ant.JSON(os.Stdout, Page{}),
        Matcher: ant.MatchHostname(host),
    })
    if err != nil {
        panic(err)
    }

    // Block until there are no more URLs to scrape.
    if err := eng.Run(context.Background(), "http://"+host); err != nil {
        panic(err)
    }
}

One suggestion I have is to include examples in your README using something like https://github.com/campoy/embedmd which would make it easier to spot when examples are out of date for some reason.

yields commented 3 years ago

I like it! thanks for the suggestion, fixed in https://github.com/yields/ant/commit/619106f7cc62d1985bb1df4493dd96d5256aee52