yatish27 / linkedin-scraper

Scrapes the public profile of the linkedin page
MIT License
554 stars 221 forks source link

languages && certifications empty response #76

Open georgepercic opened 8 years ago

georgepercic commented 8 years ago

Hi!

I just tested the library and is working great with a few minor glitches: json response returns empty values for languages and certifications (I tested on my account and i have both completed). In your code i saw this:

def languages @languages ||= @page.search(".background-languages #languages ol li").map do |item| language = item.at("h4").text rescue nil proficiency = item.at("div.languages-proficiency").text.gsub(/\s+|\n/, " ").strip rescue nil { :language => language, :proficiency => proficiency } end end

def certifications
  @certifications ||= @page.search("background-certifications").map do |item|
    name       = item.at("h4").text.gsub(/\s+|\n/, " ").strip rescue nil
    authority  = item.at("h5").text.gsub(/\s+|\n/, " ").strip rescue nil
    license    = item.at(".specifics/.licence-number").text.gsub(/\s+|\n/, " ").strip rescue nil
    start_date = item.at(".certification-date").text.gsub(/\s+|\n/, " ").strip rescue nil

    { :name => name, :authority => authority, :license => license, :start_date => start_date }
  end
end

On the public profile there are no .background-languages and background-certifications classes. I use the following code in php with simpledom library and is working:

$education = $html->find('#education > li.school'); foreach ($education as $school) { $school_name = $school->find('.item-title', 0)->innertext; $title = $school->find('.item-subtitle', 0)->innertext; $start_date = !empty($school->find('.date-range > time', 0)) ? date('Y', strtotime($school->find('.date-range > time', 0)->innertext)) : ''; $end_date = !empty($school->find('.date-range > time', 1)) ? date('Y', strtotime($school->find('.date-range > time', 1)->innertext)) : '';

        if (!empty($school_name) && !empty($title)) {
            $candidate_education[] =  $start_date . ' - ' . $end_date . ' ' . $school_name . ' - ' . $title . ' <br />';
        }
    }

    $certifications = $html->find('#certifications > li.certification');

    foreach ($certifications as $certification) {
        $name = $certification->find('h4.item-title > a', 0);

        if (!empty($name)) {
            $candidate_certifications[] = [
                'name' => $name->innertext,
                'url' => $name->href
            ];
        }
    }

Maybe this helps you.

yatish27 commented 8 years ago

Will look into it