This example from the README does not work unfortunately. Perhaps, I'm doing something wrong.
Example:
import requests
from mlscraper.html import Page
from mlscraper.samples import Sample, TrainingSet
from mlscraper.training import train_scraper
# fetch the page to train
einstein_url = 'http://quotes.toscrape.com/author/Albert-Einstein/'
resp = requests.get(einstein_url)
assert resp.status_code == 200
# create a sample for Albert Einstein
training_set = TrainingSet()
page = Page(resp.content)
sample = Sample(page, {'name': 'Albert Einstein', 'born': 'March 14, 1879'})
training_set.add_sample(sample)
# train the scraper with the created training set
scraper = train_scraper(training_set)
# scrape another page
resp = requests.get('http://quotes.toscrape.com/author/J-K-Rowling')
result = scraper.get(Page(resp.content))
print(result)
Error:
File ~/miniconda3/envs/colbert/lib/python3.8/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:133, in _make_cell_set_template_code()
116 return types.CodeType(
117 co.co_argcount,
118 co.co_nlocals,
(...)
130 (),
131 )
132 else:
--> 133 return types.CodeType(
134 co.co_argcount,
135 co.co_kwonlyargcount,
136 co.co_nlocals,
137 co.co_stacksize,
138 co.co_flags,
139 co.co_code,
140 co.co_consts,
141 co.co_names,
142 co.co_varnames,
143 co.co_filename,
144 co.co_name,
145 co.co_firstlineno,
146 co.co_lnotab,
147 co.co_cellvars, # this is the trickery
148 (),
149 )
TypeError: an integer is required (got type bytes)
This example from the README does not work unfortunately. Perhaps, I'm doing something wrong.
Example:
Error: