lorey / mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples
https://pypi.org/project/mlscraper/
1.31k stars 89 forks source link

Module not found error? #43

Closed Immortalus13 closed 9 months ago

Immortalus13 commented 9 months ago

Does mlscraper still work? I cannot get it to run (not even the sample code). I always get a ModuleNotFoundError:

ModuleNotFoundError: No module named 'mlscraper.html

import requests
from mlscraper.html import Page
from mlscraper.samples import Sample, TrainingSet
from mlscraper.training import train_scraper

# fetch the page to train
einstein_url = 'http://quotes.toscrape.com/author/Albert-Einstein/'
resp = requests.get(einstein_url)
assert resp.status_code == 200

# create a sample for Albert Einstein
# please add at least two samples in practice to get meaningful rules!
training_set = TrainingSet()
page = Page(resp.content)
sample = Sample(page, {'name': 'Albert Einstein', 'born': 'March 14, 1879'})
training_set.add_sample(sample)

# train the scraper with the created training set
scraper = train_scraper(training_set)

# scrape another page
resp = requests.get('http://quotes.toscrape.com/author/J-K-Rowling')
result = scraper.get(Page(resp.content))
print(result)
# returns {'name': 'J.K. Rowling', 'born': 'July 31, 1965'}

The package is definitely installed, though:

image

Or what am I missing

lorey commented 9 months ago

Please check the installed version and see https://github.com/lorey/mlscraper#getting-started You’ve most likely not installed the needed version (1.0+).

On 26. Jan 2024, at 10:31, Immortalus13 @.***> wrote:

Does mlscraper still work? I cannot get it to run (not even the sample code). I always get a ModuleNotFoundError:

ModuleNotFoundError: No module named 'mlscraper.html

import requests from mlscraper.html import Page from mlscraper.samples import Sample, TrainingSet from mlscraper.training import train_scraper

fetch the page to train

einstein_url = 'http://quotes.toscrape.com/author/Albert-Einstein/' resp = requests.get(einstein_url) assert resp.status_code == 200

create a sample for Albert Einstein

please add at least two samples in practice to get meaningful rules!

training_set = TrainingSet() page = Page(resp.content) sample = Sample(page, {'name': 'Albert Einstein', 'born': 'March 14, 1879'}) training_set.add_sample(sample)

train the scraper with the created training set

scraper = train_scraper(training_set)

scrape another page

resp = requests.get('http://quotes.toscrape.com/author/J-K-Rowling') result = scraper.get(Page(resp.content)) print(result)

returns {'name': 'J.K. Rowling', 'born': 'July 31, 1965'}

The package is definitely installed, though: image.png (view on web) https://github.com/lorey/mlscraper/assets/157795088/f97ec2bd-0d7f-4960-9025-9e4423f9198f Or what am I missing

— Reply to this email directly, view it on GitHub https://github.com/lorey/mlscraper/issues/43, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL2TLONO7OJA5VBJTOW463YQNZWZAVCNFSM6AAAAABCL3M4XSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYDCOBYGMYTAMQ. You are receiving this because you are subscribed to this thread.

Immortalus13 commented 9 months ago

Thanks, that actually did the trick! Appreciate the quick help!!