lorey / mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples
https://pypi.org/project/mlscraper/
1.31k stars 89 forks source link

Example from docs does not work #30

Closed creatorrr closed 2 years ago

creatorrr commented 2 years ago

This example from the README does not work unfortunately. Perhaps, I'm doing something wrong.

Example:

import requests
from mlscraper.html import Page
from mlscraper.samples import Sample, TrainingSet
from mlscraper.training import train_scraper

# fetch the page to train
einstein_url = 'http://quotes.toscrape.com/author/Albert-Einstein/'
resp = requests.get(einstein_url)
assert resp.status_code == 200

# create a sample for Albert Einstein
training_set = TrainingSet()
page = Page(resp.content)
sample = Sample(page, {'name': 'Albert Einstein', 'born': 'March 14, 1879'})
training_set.add_sample(sample)

# train the scraper with the created training set
scraper = train_scraper(training_set)

# scrape another page
resp = requests.get('http://quotes.toscrape.com/author/J-K-Rowling')
result = scraper.get(Page(resp.content))
print(result)

Error:

File ~/miniconda3/envs/colbert/lib/python3.8/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:133, in _make_cell_set_template_code()
    116     return types.CodeType(
    117         co.co_argcount,
    118         co.co_nlocals,
   (...)
    130         (),
    131     )
    132 else:
--> 133     return types.CodeType(
    134         co.co_argcount,
    135         co.co_kwonlyargcount,
    136         co.co_nlocals,
    137         co.co_stacksize,
    138         co.co_flags,
    139         co.co_code,
    140         co.co_consts,
    141         co.co_names,
    142         co.co_varnames,
    143         co.co_filename,
    144         co.co_name,
    145         co.co_firstlineno,
    146         co.co_lnotab,
    147         co.co_cellvars,  # this is the trickery
    148         (),
    149     )

TypeError: an integer is required (got type bytes)
lorey commented 2 years ago

Hi @creatorrr. Please check your versions, this looks like a 0.* version, not 1.*. Please note that 1.* is a pre-version as described here: https://github.com/lorey/mlscraper#getting-started