spinlud / py-linkedin-jobs-scraper

MIT License
311 stars 86 forks source link

How to output data as JSON #8

Closed ankurGhosh1 closed 3 years ago

ankurGhosh1 commented 3 years ago

I am trying to scrap the jobs and use the response as a json endpoint. My code is mostly same as in the example but the data is not returned in JSON format. Here is the code I have written for the scrapper:

joblist = {
        "Query": "",
    }

    def on_data(data: EventData):
        print('[ON_DATA]', data.title, data.company, data.date, data.link)
        for data in data:
            joblist.update({"Query": data})

    def on_error(error):
        print('[ON_ERROR]', error)

    def on_end():
        print('[ON_END]')

    scraper = LinkedinScraper(
        chrome_executable_path='C:/Users/indianLeo/Desktop/chromedriver.exe', # Custom Chrome executable path (e.g. /foo/bar/bin/chromedriver) 
        chrome_options=None,  # Custom Chrome options here
        headless=True,  # Overrides headless mode only if chrome_options is None
        max_workers=1,  # How many threads will be spawned to run queries concurrently (one Chrome driver for each thread)
        slow_mo=5,  # Slow down the scraper to avoid 'Too many requests (429)' errors
    )

    scraper.on(Events.DATA, on_data)
    scraper.on(Events.ERROR, on_error)
    scraper.on(Events.END, on_end)

    queries = [Query(
        options=QueryOptions(
            optimize=True,
            limit=2,
            filters=QueryFilters(
                # Paste link below
                company_jobs_url='https://www.linkedin.com/jobs/search/?f_C=1441&location=Worldwide', # https://www.linkedin.com/jobs/search/?f_C=30204141&location=Worldwide
            )
        )
    )]
    scraper.run(queries)
    print(joblist)

Here is the response I am getting:

[ON_DATA] Information Services, Internet
[ON_DATA] Information Services, Internet
[ON_END]
{'Query': 'Information Services, Internet'}

What I am expecting is { "message" : { "data": data, "date2": data2, ... }

I cannot even use data.title in any other places, it returns AttributeError: 'int' object has no attribute 'title'

ankurGhosh1 commented 3 years ago

job = { "job_id": data.id, "link": data.link, ... }