linkedtales / scrapedin

LinkedIn Scraper (currently working 2020)
Apache License 2.0
598 stars 174 forks source link

Add href to values in contact info #105

Open rileyai-dev opened 4 years ago

rileyai-dev commented 4 years ago

Currently, the contact info returns an array of objects with type (header) and values. I find it very difficult to use it since the header can be different depending on languages and the website values as well as twitter values don't return a proper link.

I was wondering if it could return values to be an array of object such as:

[
  { type: 'Twitter', values: [ {value: 'myTwitter', link:'https://twitter.com/myTwitter'} ], },
  { type: 'Websites', values: [ {value: 'hgc.harvard.edu (Harvard Graduate Council)', link:'https://hgc.harvard.edu'}, {value: 'hga.mit.edu (MIT Graduate Association)', link:'https://hga.mit.edu'}] }
]

I have tried to change the contact info template to get the href for the link and create an array of objects but I'm struggling to understand how templates are structured!

Current template

const template = {
  selector: '.pv-contact-info__contact-type',
  fields: {
    type: 'header',
    values: {
      selector: '.pv-contact-info__ci-container',
      isMultipleFields: true
    }
  }
} 

which returns:

[
  { type: 'Twitter', values: [ 'myTwitter' ] },
  { type: 'Websites', values: [ 'hgc.harvard.edu (Harvard Graduate Council)', 'hga.mit.edu (MIT Graduate Association)'] }
]

How can I get the href and create an array of objects?

leonardiwagner commented 4 years ago

@grapevineai you can take a look on profileScraperTemplate.js, someone did something related on imageurl

you can place both string or object as the value in a template, so inside values instead of using '.pv-contact-info__ci-container' you can create an object as imageurl.. with the desired value and href parameters, good luck :smile:

rileyai-dev commented 4 years ago

I have created this PR to partially solve this issue... https://github.com/linkedtales/scrapedin/pull/112

It's not perfect since it creates two arrays, one for the values and the other one for the hrefs. It's better than nothing an in most cases, the first item of values will match the first item of hrefs and so on.