spinlud / linkedin-jobs-scraper

151 stars 41 forks source link

How to also get the current date of the job posting #5

Closed DanielGarciaGuillen closed 4 years ago

DanielGarciaGuillen commented 4 years ago

Hey!

First thank you for the package, works super well and makes the job much easier.

I forked to try to also be able to grab the time the job post was submitted without success, maybe with a little help I can submit a mr to include the data.

I was thinking on creating a custom selector for get the data, on this selector element there is an attribute called datime that will get us the date on the following format:

datetime="2019-09-13"

And then add on the same place you scrap the jobs something like:

 [jobTitle, jobCompany, jobPlace, jobDate] = await page.evaluate(
            (
              linksSelector,
              companiesSelector,
              placesSelector,
              dateSelector,
              jobIndex
            ) => {
              return [
                document.querySelectorAll(linksSelector)[jobIndex].innerText,
                document.querySelectorAll(companiesSelector)[jobIndex]
                  .innerText,
                document.querySelectorAll(placesSelector)[jobIndex].innerText,
                document.body.querySelectorAll(dateSelector)[jobIndex],
                (el) => el.getAttribute("datetime"),
              ];
            },
            linksSelector,
            companiesSelector,
            placesSelector,
            jobIndex,
            dateSelector
          );

I haven't be able to make it work, I think I am close but maybe I am missing something.

Thank you

spinlud commented 4 years ago

Hi! I've just added date field, you can extract it as follows:

scraper.on(events.custom.data, ({ query, location, link, title, company, place, description, date }) => {
        console.log(
            description.length,
            `Query='${query}'`,
            `Location='${location}'`,
            `Title='${title}'`,
            `Company='${company}'`,
            `Place='${place}'`,
            `Date='${date}'`,
            `Link='${link}'`,
        );
    });

Hope this helps!

DanielGarciaGuillen commented 4 years ago

Thanks man I really appreciate!