spinlud / linkedin-jobs-scraper

147 stars 40 forks source link

Company name and date not being extracted #40

Closed calvinomiguel closed 1 year ago

calvinomiguel commented 1 year ago

It's not exporting the companies' names anymore.

grafik grafik

Company name and date are being shown as an empty string.

spinlud commented 1 year ago

It seems to work here:

import {
    LinkedinScraper,
    events,
} from 'linkedin-jobs-scraper';

(async () => {
    const scraper = new LinkedinScraper({
        headless: true,
        slowMo: 250,
        args: [
            "--lang=en-GB",
        ],
    });

    scraper.on(events.scraper.data, (data) => {
        console.log(
            `Title='${data.title}'`,
            `Company='${data.company ? data.company : "N/A"}'`,
            `link='${data.link}'`,
            `applyLink='${data.applyLink ? data.applyLink : "N/A"}'`,
        );
    });

    await scraper.run([
        {
            query: "Deep Learning",
            options: {
                locations: ["United States"],
                limit: 10,
                applyLink: true,
            }
        },
    ]);

    await scraper.close();
})();

image

Node: v16.17.0 linkedin-jobs-scraper: 14.0.6

ldelagarde commented 1 year ago

Versions

Node: v16.19.0 linkedin-jobs-scraper: 14.0.6

Problem

I encountered the same problem for company names.

Here is an example of a job I may have in my jobs search (if that can help): ```HTML
  • Logo de La medtech innovante
    La medtech innovante
  • ```

    Explanation

    To get the company name, the scraper seems to use the text in the link (the HTML tag <a>) https://github.com/spinlud/linkedin-jobs-scraper/blob/cf63e370794ae82892e27d31d59c8edfacaf25e5/src/scraper/strategies/AuthenticatedStrategy.ts#L19 https://github.com/spinlud/linkedin-jobs-scraper/blob/cf63e370794ae82892e27d31d59c8edfacaf25e5/src/scraper/strategies/AuthenticatedStrategy.ts#L490

    However, on my jobs search on LinkedIn, jobs never have a link on the company name.

    <div id="ember1436" class="artdeco-entity-lockup__subtitle ember-view">
      <div class="job-card-container__company-name">
        Company name
      </div>
    </div>

    Temporary fix

    My temporary fix was to replace the selector value companyLink with div.job-card-container__company-name. It's not the cleanest stuff, but it gets the job done !

    spinlud commented 1 year ago

    Hi @ldelagarde, thank you for the feedback! So when you are on Linkedin website, company names are not navigable, just text? image

    spinlud commented 1 year ago

    Btw I have updated the company selector with .job-card-container__company-name, give it a try

    ldelagarde commented 1 year ago

    Hi @spinlud, thanks to you for making this tool!

    So when you are on Linkedin website, company names are not navigable, just text? @spinlud - Jan 28, 2023, 7:24 PM UTC

    Yeah, on jobs, job title is a link, but not the company name image

    Btw I have updated the company selector with .job-card-container_company-name, give it a try @spinlud - Jan 28, 2023, 8:00 PM UTC_

    With your last version (14.0.8), I can get company name!

    Concerning the dates, I succeed to get them (75% of the time I don't get them, but I don't know if it's because LinkedIn doesn't put it there..?)

    Here the logs as proof:

    scraper:info [devops][Centre-Val de Loire, France][4] Processed +974ms
    152 2267
    Query='devops' Location='Centre-Val de Loire, France' Id='3423562961' Title='DevOps (IT) / Freelance' Company='Free-Work (ex Freelance-info Carriere-info)' CompanyLink='N/A' CompanyImgLink='https://media.licdn.com/dms/image/C4E0BAQF5W6yWMv4fbA/company-logo_100_100/0/1658140274782?e=1683158400&v=beta&t=5tN3uvfLxCwkOzb8rMVsi8sbzWmiKw0Rr_S3AfvcgxM' Place='Orléans, Centre-Val de Loire, France Sur site' Date='2023-01-05' Link='https://www.linkedin.com/jobs/view/3423562961/?eBP=JOB_SEARCH_ORGANIC&recommendedFlavor=JOB_SEEKER_QUALIFIED&refId=XXX&trk=XXX' applyLink='N/A' insights=''

    Works wonderfully for me! I think you can close this ticket :+1: Thank you for all that you do and are doing!

    spinlud commented 1 year ago

    Ok cool! I am not sure but I suppose Linkedin provides a different UI based on some factors like geolocation and others, that's probably why css selectors don't always work for all accounts. Unfortunately I can only test what I am able to see myself 🤷‍♂️