Find and fix issue with github profile pages

lorey commented 2 years ago

follower counts have no unique selector (need nth or something else)
image width and height get matched when searching for 20 followers (as icons have manually set dimensions)

lorey commented 2 years ago

follower count matched by nth-child pseudoselector 42e19d25d5c21e20250d3f327f66a90b7846d57a
image width and height fixed by aa1ac21a0ede6f4f6a4282fcb07f87d706186817

lorey commented 2 years ago

Test Case added in c3427b79a09d4ea4595ab775f8c267364975b60c

jonashaag commented 2 years ago

This is still broken for me, what am I doing that's different from your test case?

import requests
from mlscraper.html import Page
from mlscraper.samples import Sample, TrainingSet
from mlscraper.training import train_scraper

jonas_url = "https://github.com/jonashaag"
resp = requests.get(jonas_url)
resp.raise_for_status()

page = Page(resp.content)
sample = Sample(
    page,
    {
        "name": "Jonas Haag",
        "followers": "329",
        "company": "@Quantco",
        "twitter": "@_jonashaag",
        "username": "jonashaag",
        "nrepos": "282",
    },
)

training_set = TrainingSet()
training_set.add_sample(sample)

scraper = train_scraper(training_set)

resp = requests.get("https://github.com/lorey")
result = scraper.get(Page(resp.content))
print(result)

jonashaag commented 2 years ago

Are you testing with a logged-in HTML dump maybe?

lorey / mlscraper

Find and fix issue with github profile pages #23