josephlimtech / linkedin-profile-scraper-api

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.
MIT License
566 stars 152 forks source link

All fields of User-Profile object is Null except the url #4

Closed Shahzad6077 closed 4 years ago

mralexgray commented 4 years ago

For example...

{
  userProfile: {
    fullName: null,
    title: null,
    location: null,
    photo: null,
    description: null,
    url: 'https://www.linkedin.com/in/me/'
  }

the problem likely starts around line 195, and is exhibited by inspecting the value of userProfile variable.

jvandenaardweg commented 4 years ago

Thanks for reporting! This is now fixed in master

Changes: https://github.com/jvandenaardweg/linkedin-profile-scraper/pull/5

dmmarmol commented 3 years ago

Hi everyone!

I'm facing a similar issue and I did check I've got the latest versions from the selectors you pushed in #5 ✔️ .

Judging by the logs, it seems that some "View more" buttons are being missed, given tough that such selectors are correct (manually checked them in a Browser)

export const RequestLinkedin = async ({ language }) => {
    try {
        const scraper = new LinkedInProfileScraper({
            sessionCookieValue: process.env.LI_AT_COOKIE_VALUE,
            keepAlive: process.env.NODE_ENV === 'development',
        });

        // Prepare the scraper
        // Loading it in memory
        await scraper.setup();

        const url = getURL({ language });
        const result = await scraper.run(url);

        return result;
    } catch (err) {
        if (err.name === 'SessionExpired') {
            // Do something when the scraper notifies you it's not logged-in anymore
            throw new Error('SessionExpired');
        }
        return;
    }
};

Logs

Click to expand! > Scraper (setup): Launching puppeteer in the background... > Scraper (setup): Puppeteer launched! > Scraper (setup page): Blocking the following resources: image, media, font, texttrack, object, beacon, csp_report, imageset > Scraper (setup page): Should block scripts from 10366 unwanted hosts to speed up the crawling. > Scraper (setup page): Setting session cookie using cookie: undefined > Scraper (setup page): Session cookie set! > Scraper (setup page): Done! > Scraper (checkIfLoggedIn): Checking if we are still logged in... > Scraper (blocked script): xhr: dpm.demdex.net: https://dpm.demdex.net/id?d_visid_ver=5.1.1&d_fieldgroup=MC&d_rtbd=json&d_ver=2&d_orgid=14215E3D5995C57C0A495C55%40AdobeOrg&d_nsid=0&ts=1614113026516 > Scraper (blocked script): xhr: dpm.demdex.net: https://dpm.demdex.net/id?d_visid_ver=5.1.1&d_fieldgroup=AAM&d_rtbd=json&d_ver=2&d_orgid=14215E3D5995C57C0A495C55%40AdobeOrg&d_nsid=0&d_mid=49017425936221650123187014327421593955&ts=1614113026536 > Scraper (blocked script): xhr: dpm.demdex.net: https://dpm.demdex.net/id?d_visid_ver=5.1.1&d_fieldgroup=AAM&d_rtbd=json&d_ver=2&d_orgid=14215E3D5995C57C0A495C55%40AdobeOrg&d_nsid=0&d_mid=49017425936221650123187014327421593955&d_cid_ic=lnkdidsync%01AX1gF6l-GPXUsEfrGZfopE4VFCzS%26v%3D2%011&d_cid_ic=thirdpartyid%01AX1gF6l-GPXUsEfrGZfopE4VFCzS%26v%3D2%011&d_cid_ic=lnkd_member_id%01AX1gF6l-GPXUsEfrGZfopE4VFCzS%26v%3D2%011&ts=1614113026570 > Scraper (checkIfLoggedIn): All good. We are still logged in. > Scraper (setup): Done! > Scraper (setup page): Blocking the following resources: image, media, font, texttrack, object, beacon, csp_report, imageset > Scraper (setup page): Should block scripts from 10366 unwanted hosts to speed up the crawling. > Scraper (setup page): Setting session cookie using cookie: undefined > Scraper (setup page): Session cookie set! > Scraper (setup page): Done! > Scraper (run) (1614113027545): Navigating to LinkedIn profile: https://linkedin.com/in/[USER_PROFILE]/en-US > Scraper (run) (1614113027545): LinkedIn profile page loaded! > Scraper (run) (1614113027545): Getting all the LinkedIn profile data by scrolling the page to the bottom, so all the data gets loaded into the page... > Scraper (run) (1614113027545): Parsing data... > Scraper (run) (1614113027545): Expanding all sections by clicking their "See more" buttons > Scraper (run) (1614113027545): Clicking button .pv-profile-section.pv-about-section .lt-line-clamp__more > Scraper (run) (1614113027545): Clicking button .pv-skill-categories-section [data-control-name="skill_details"] > Scraper (run) (1614113027545): Expanding all descriptions by clicking their "See more" buttons > Scraper (run) (1614113027545): Clicking button .lt-line-clamp__more[href="#"]:not(.lt-line-clamp__ellipsis--dummy) > Scraper (run) (1614113027545): Clicking button .lt-line-clamp__more[href="#"]:not(.lt-line-clamp__ellipsis--dummy) > Scraper (run) (1614113027545): Clicking button .lt-line-clamp__more[href="#"]:not(.lt-line-clamp__ellipsis--dummy) > Scraper (run) (1614113027545): Clicking button .lt-line-clamp__more[href="#"]:not(.lt-line-clamp__ellipsis--dummy) > Scraper (run) (1614113027545): Could not find or click see more button selector "JSHandle@node". So we skip that one. > Scraper (run) (1614113027545): Clicking button .lt-line-clamp__more[href="#"]:not(.lt-line-clamp__ellipsis--dummy) > Scraper (run) (1614113027545): Could not find or click see more button selector "JSHandle@node". So we skip that one. > Scraper (run) (1614113027545): Parsing profile data... > Scraper (run) (1614113027545): Got user profile data: {"fullName":null,"title":null,"location":null,"photo":null,"description":null,"url":"https://www.linkedin.com/feed/"} > Scraper (run) (1614113027545): Parsing experiences data... > Scraper (run) (1614113027545): Got experiences data: [] > Scraper (run) (1614113027545): Parsing education data... > Scraper (run) (1614113027545): Got education data: [] > Scraper (run) (1614113027545): Parsing volunteer experience data... > Scraper (run) (1614113027545): Got volunteer experience data: [] > Scraper (run) (1614113027545): Parsing skills data... > Scraper (run) (1614113027545): Got skills data: [] > Scraper (run) (1614113027545): Done! Returned profile details for: https://linkedin.com/in/[USER_PROFILE]/en-US > Scraper (run): Done. Puppeteer is being kept alive in memory. > { > userProfile: { > fullName: null, > title: null, > location: null, > photo: null, > description: null, > url: 'https://www.linkedin.com/feed/' > }, > experiences: [], > education: [], > volunteerExperiences: [], > skills: [] > }