tomquirk / linkedin-api

👨‍💼Linkedin API for Python
MIT License
1.71k stars 401 forks source link

`search_people` endpoint doesn't work anymore #313

Open agusmdev opened 1 year ago

agusmdev commented 1 year ago

I used this endpoint 2 days ago and it was working correctly, but it seems LinkedIn updated their API, and now instead of using the endpoint /search/blended they use /graphql?variables=.....

When I use search_people I always get a 403 response, is somebody experiencing the same issue?

PS: My cookie session is working properly

bleeexecleads commented 1 year ago

I am noticing similar behavior, but it probably happens 70% of the time. I'm not quite sure how to resolve this issue.

tnachen commented 1 year ago

Got the same problem too

davidcastillo commented 1 year ago

Same problem here

asancheza commented 1 year ago

Did someone found a solution to replace the current code?

        res = self._fetch(
            f"/search/blended?{urlencode(default_params, safe='(),')}",
            headers={"accept": "application/vnd.linkedin.normalized+json+2.1"},
        )
        data = res.json()
alexfoggy commented 1 year ago

Same problem :(

ameygoes commented 1 year ago

Please update if anyone has any resolution on this.

I am also willing to work on this with anyone who is interested maybe on call, let me know.

MoHayat commented 1 year ago

same here, toying around with it atm

MoHayat commented 1 year ago

@agusmdev do you happen to know where in the documentation it mentions the new endpoint?

agusmdev commented 1 year ago

@agusmdev do you happen to know where in the documentation it mentions the new endpoint?

Nowhere, I checked that with my Linkedin account executing a search query from the browser

MoHayat commented 1 year ago

gotcha, was digging through the docs and I wasn't able to find anything so makes sense, I'll take a look there

17314642 commented 1 year ago

The code below is my basic implementation of getting list of first 10 employees (because 1 request returns exactly that, so offset can be used to request not from 1st employee, but from 10th for example), parsing it and returning some basic data. Very little is parsed since I don't need all the data. But I think this might be a good starting point.

def fetch_employees(company_id, offset=0):
    cache = f"companies/{company_id}/employees_{offset}.json"
    if os.path.exists(cache):
        r = json.loads(open(cache).read())
        print(f"[get_employees()]: OK! Using cached file \"{cache}\".")
    else:
        uri = f"/graphql?includeWebMetadata=true&variables=(start:{offset},origin:COMPANY_PAGE_CANNED_SEARCH,query:(flagshipSearchIntent:SEARCH_SRP,queryParameters:List((key:currentCompany,value:List({company_id})),(key:resultType,value:List(PEOPLE))),includeFiltersInResponse:false))&&queryId=voyagerSearchDashClusters.b0928897b71bd00a5a7291755dcd64f0"
        r = API._fetch(uri)

        if not r.ok:
            print(f"[fetch_employees()]: Fail! LinkedIn returned status code {resp.status_code} ({r.reason})")
            return

        print(f"[fetch_employees()]: OK! LinkedIn returned status code {r.status_code} ({r.reason})")
        r = r.json()

        # Cache request
        os.makedirs(f"companies/{company_id}", exist_ok=True)
        with open(cache, "w") as f:
            json.dump(r, f)

        if not r["data"]["searchDashClustersByAll"]:
            print(f"Bad json. LinkedIn returned error:", r["errors"][0]["message"])
            os.remove(cache)
            return

    return r["data"]["searchDashClustersByAll"]

def get_employees(company_id, offset=0):
    def get_item_key(item, keys):
        if type(keys) == str:
            keys = [keys]

        cur = item
        for key in keys:
            if cur and key in cur.keys():
                cur = cur[key]
            else:
                return ""

        return cur

    j = fetch_employees(company_id)
    if not j:
        return []

    if not j["_type"] == "com.linkedin.restli.common.CollectionResponse":
        return []

    employees = []
    for it in j["elements"]:
        if not it["_type"] == "com.linkedin.voyager.dash.search.SearchClusterViewModel":
            continue

        for it in it["items"]:
            if not it["_type"] == "com.linkedin.voyager.dash.search.SearchItem":
                continue

            e = it["item"]["entityResult"]
            if not e or not e["_type"] == "com.linkedin.voyager.dash.search.EntityResultViewModel":
                continue

            try:
                #print("\nEmployee:")
                #print("    ", get_item_key(e, ["title", "text"]))
                #print("    ", get_item_key(e, "entityUrn"))
                #print("    ", get_item_key(e, ["primarySubtitle", "text"]))
                #print("    ", get_item_key(e, ["secondarySubtitle", "text"]))

                employees.append({
                    "title": get_item_key(e, ["title", "text"]),
                    "entityUrn": get_item_key(e, "entityUrn"),
                    "primarySubtitle": get_item_key(e, ["primarySubtitle", "text"]),
                    "secondarySubtitle": get_item_key(e, ["secondarySubtitle", "text"]),
                })
            except Exception as e:
                print(f"Exception {e} while processing employees of id {company_id}")
                exit(1)

    return employees
ameygoes commented 1 year ago

Is this code working?.

17314642 commented 1 year ago

Is this code working?.

It is working for me in my program :)

ameygoes commented 1 year ago

Oh okay, इ I will try on my end, if it doesn’t will you be able to connect with me on. A meet?

17314642 commented 1 year ago

Oh okay, इ I will try on my end, if it doesn’t will you be able to connect with me on. A meet?

I don't think I'm the right person to answer these kind of questions ;). All I did in my code is pure guessing + looking at lots of json requests. But if anything less serious happens, you could try writing here, so it will also help others if they stumble upon the same problem.

ameygoes commented 1 year ago

What is the company_id here?

ameygoes commented 1 year ago

Wrote a small function for getting company_id

def getCompanyID(company_link):
    try:
        company_username = company_link.split('.com/company/')[1].replace('/','')
    except:
        print("Wrong Company URL. Company Format should be https://www.linkedin.com/company/company_Username/!")
        return None

    api_link = 'https://www.linkedin.com/voyager/api/organization/companies?decorationId=com.linkedin.voyager.deco.organization.web.WebCompanyStockQuote-2&q=universalName&universalName={}'.format(quote(company_username))
    resp = api._get(api_link).json()
    company_id = resp.get('elements')[0].get('entityUrn').split(':')[-1]
    return company_id
17314642 commented 1 year ago

company_id is numerical id of the company (google = 1441, facebook = 76987811). It can be retreived as urn from linkedin_api and then converted to numerical id using built-in helper function

Example snippet:

from linkedin_api.utils import helpers

company = API.get_company("google")
company_id = helpers.get_id_from_urn(company["entityUrn"])
employees = get_employees(company_id)

# Print name of first 10 employees
for e in employees:
    print(e["title"])

PS: There were some minor typos in my initial code (https://github.com/tomquirk/linkedin-api/issues/313#issuecomment-1574333025) which I fixed already. So just re-paste it.

MoHayat commented 1 year ago

might be a little out of the loop here, how does this code fix the search function?

MoHayat commented 1 year ago

as an update, I think was able to get the mappings right for the new search endpoint. The current tests show that 16/24 tests fail that are all tied to the search function, so I'll be forking the repo and seeing if I can bring it back up to 24/24.

17314642 commented 1 year ago

might be a little out of the loop here, how does this code fix the search function?

It doesn't. I wanted to use search_people in my project, but it was broken. So I wrote my own small variation of it and posted it in case someone needed it. It can output only minimal information, but that's okay for me, since that was all I needed. If anyone needs more than that, I thought that code would've been a nice little foundation.

bleschunov commented 1 year ago

I used this endpoint 2 days ago and it was working correctly, but it seems LinkedIn updated their API, and now instead of using the endpoint /search/blended they use /graphql?variables=.....

Hey! Where can I find the information about this new API? Share the links to the reference please

17314642 commented 1 year ago

I used this endpoint 2 days ago and it was working correctly, but it seems LinkedIn updated their API, and now instead of using the endpoint /search/blended they use /graphql?variables=.....

Hey! Where can I find the information about this new API? Share the links to the reference please

https://github.com/tomquirk/linkedin-api/issues/313#issuecomment-1573709203

MoHayat commented 1 year ago

The output from the new endpoint is a little confusing, anyone know how to make sense of it? seems like it's returning multiple attributes that come together to make a single profile on the website. Screenshot 2023-06-05 at 1 44 48 PM

Timur-Gizatullin commented 1 year ago

I've noticed they are using 2 endpoints to get people by different params. First one returns only urn ids and second one returns list of profiles by list of urn ids. I can fetch urn ids but for some reason second endpoint returns me 400. Probably it has some specific headers or something idk for now.

My solution: `

def graphql_search_people(
            self,
            job_title: str,
            regions: list[str],
            limit: int | None,
            offset: int
    ) -> list[dict]:
        """Get list of user's urns by job_title and regions."""
        count = Linkedin._MAX_SEARCH_COUNT
        if limit is None:
            limit = -1
    results = []
    while True:
        # when we're close to the limit, only fetch what we need to
        if limit > -1 and limit - len(results) < count:
            count = limit - len(results)

        default_params = {
            "origin": "FACETED_SEARCH",
            "start": len(results) + offset,
        }

        res = self._fetch(
            (f"/graphql?variables=(start:{default_params['start']},origin:{default_params['origin']},"
             f"query:(keywords:{job_title},flagshipSearchIntent:SEARCH_SRP,"
             f"queryParameters:List((key:geoUrn,value:List({','.join(regions)})),"
             f"(key:resultType,value:List(PEOPLE))),"
             f"includeFiltersInResponse:false))&=&queryId=voyagerSearchDashClusters"
             f".b0928897b71bd00a5a7291755dcd64f0"),
            headers={"accept": "application/vnd.linkedin.normalized+json+2.1"},
        )

        logger.debug(res.text)
        data = json.loads(res.text)

        new_elements = []
        elements = data.get("included", [])
        logger.debug(f"Profile urns: {elements}")

        for i in range(0, 10):
            new_elements.append(elements[i]["entityUrn"])

        results.extend(self._get_people_by_urns(urns=new_elements))

        # break the loop if we're done searching
        # NOTE: we could also check for the `total` returned in the response.
        # This is in data["data"]["paging"]["total"]
        if (
                (-1 < limit <= len(results))  # if our results exceed set limit
                or len(results) / count >= Linkedin._MAX_REPEATED_REQUESTS
        ) or len(new_elements) == 0:
            break

        self.logger.debug(f"results grew to {len(results)}")

    return results

def _get_people_by_urns(self, urns: list[str]) -> list[dict]:
    """Get profiles info by urns."""
    profiles = []

    for urn in urns:
        clear_urn = urn.split(":")[-1]
        profiles.append(self.get_profile(urn_id=clear_urn))

    return profiles`

URL to fetch profiles (always returns 400): https://www.linkedin.com/voyager/api/graphql?variables=(lazyLoadedActionsUrns:List(urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAA6ZpN0B-fPBL3atd5cCsIS9cl7w3zXLylw,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAAAJcNcBZWx8gvYiUs_1cLtFiwXhXoNQihc,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAAD-cOsB2wB0EldN_R22uvya2ZcYuefBKPI,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAADEXysBWdPqwfO-p8MyOQOwaWMB2qO0Umg,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions:(urn:li:fsd_profileActions:(ACoAAAO9jNABuhihN_wVSgFGgDry9xrGYM-cmzU,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAAAfUQGcBG3VTivwWqKm9Gw5g8F3Rt8gUwQ8,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions:(urn:li:fsd_profileActions:(ACoAABdSlasBRb9Dp9rwdkpKS3_atJQPLkAt0jY,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAABfd6ZoBNCHS45DdfDVHMABssw9S57AH4-Y,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAACKT0KABrXki4zf6VnGenRUxSBmG-udwtag,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP),urn:li:fsd_lazyLoadedActions: (urn:li:fsd_profileActions:(ACoAACWuoz0BNF2Tcij9PyIymEc65yt_mlrzAfk,SEARCH,EMPTY_CONTEXT_ENTITY_URN),PEOPLE,SEARCH_SRP))) &=&queryId=voyagerSearchDashLazyLoadedActions.9efa2f2f5bd10c3bbbbab9885c3c0a60

The output from the new endpoint is a little confusing, anyone know how to make sense of it? seems like it's returning multiple attributes that come together to make a single profile on the website. Screenshot 2023-06-05 at 1 44 48 PM

TanguyBellec commented 1 year ago

Someone has find a solution ?

MoHayat commented 1 year ago

No luck on my end, while I was testing it the linkedin account I was using became restricted so I just went ahead and did a one time script for what I needed to do.

Best, -Mo Co-Founder & CTO, Abstract.us ( http://abstract.us/ )

On Tue, Jun 13, 2023 at 9:36 AM, TanguyBellec < @.*** > wrote:

Someone has find a solution ?

— Reply to this email directly, view it on GitHub ( https://github.com/tomquirk/linkedin-api/issues/313#issuecomment-1589334015 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AHPI4H37NRDSV5HD4G5MVXDXLBUHJANCNFSM6AAAAAAYPQBKEU ). You are receiving this because you commented. Message ID: <tomquirk/linkedin-api/issues/313/1589334015 @ github. com>

DTS-And-Eaglepoint-Funding commented 1 year ago

i found this from linkedin https://github.com/linkedin-developers/linkedin-api-python-client/blob/main/linkedin_api/clients/restli/utils/encoder.py https://github.com/linkedin-developers/linkedin-api-python-client/blob/main/linkedin_api/clients/restli/utils/decoder.py it should help formatting url params and to better understand what each request is doing

linda-benboudiaf commented 1 year ago

Same, not working always returns empty list !

engahmed1190 commented 11 months ago

any news on the LinkedIn Search. It's very important feature. Thanks for contribution

linda-benboudiaf commented 11 months ago

any news on the LinkedIn Search. It's very important feature. Thanks for contribution

Still down unfortunately :/

ignaciovi commented 10 months ago

I created a draft PR with the changes suggested by @17314642 and @Timur-Gizatullin + a few modifications. The search_people and search_companies endpoint work for me with these changes and the parameters of my use case but I haven't tested all the other combinations.

Feel free to add any improvements or suggest changes! I might take a look again at it if I get some time and try to do a cleaner fix, if there is one.

DerLomo commented 10 months ago

I created a draft PR with the changes suggested by @17314642 and @Timur-Gizatullin + a few modifications. The search_people and search_companies endpoint work for me with these changes and the parameters of my use case but I haven't tested all the other combinations.

Feel free to add any improvements or suggest changes! I might take a look again at it if I get some time and try to do a cleaner fix, if there is one.

Please push it soon

diogobarreto commented 8 months ago

company_id is numerical id of the company (google = 1441, facebook = 76987811). It can be retreived as urn from linkedin_api and then converted to numerical id using built-in helper function

Example snippet:

from linkedin_api.utils import helpers

company = API.get_company("google")
company_id = helpers.get_id_from_urn(company["entityUrn"])
employees = get_employees(company_id)

# Print name of first 10 employees
for e in employees:
    print(e["title"])

PS: There were some minor typos in my initial code (#313 (comment)) which I fixed already. So just re-paste it.

@17314642 Is this "get" still working? I get the following error when trying to run it:

PS` C:\LinkedIn\linkedin-api> python SearchID.py Traceback (most recent call last): File "C:\LinkedIn\linkedin-api\SearchID.py", line 41, in company = api.get_company("current_cia") File "C:\LinkedIn\linkedin-api\linkedin_api\linkedin.py", line 975, in get_company self.logger.info("request failed: {}".format(data["message"])) KeyError: 'message'

tomquirk commented 3 months ago

Hey everyone.

Can y'all try version 2.1.1 and let me know if it fixes any issues?

Gimme-Danger commented 1 month ago

The output from the new endpoint is a little confusing, anyone know how to make sense of it? seems like it's returning multiple attributes that come together to make a single profile on the website. Screenshot 2023-06-05 at 1 44 48 PM

Maybe someone else here will find this information useful, so I'll just leave it here.

def walk_through_data(obj, val_dict):
    if isinstance(obj, dict) and obj:
        keys_to_remove = [k for k in obj if k[0] == '*']
        for k in keys_to_remove:
            old_val = obj.pop(k)
            new_val = [val for val in val_dict if val['entityUrn'] == old_val]
            if new_val:
                obj[k[1:]] = new_val[0]
                # val_dict.remove(new_val[0])
            else:
                print(f'Could not find: {old_val}')

        for v in obj.values():
            walk_through_data(v, val_dict)
    elif isinstance(obj, list) and obj:
        for elem in obj:
            walk_through_data(elem, val_dict)

with open('./request_sessions/test_response_posts_all_old_voyager5(no_headers)0.json', "r", encoding='utf-8') as json_file:
    response = json.load(json_file)

feed_key = 'feedDashProfileUpdatesByMemberShareFeed' # What type of data was received in the response
initial_type = 'com.linkedin.voyager.dash.feed.Update' # Find what type of object the root data has
included_data = response['included']
response_data = response['data']['data'][feed_key]['*elements']
initial_data = [p for p in included_data if p['$type'] == initial_type and p['entityUrn'] in response_data]

for d in initial_data:
    walk_through_data(d, included_data)

response.pop('included')
response.pop('meta')

response['data'] = response['data']['data']
response['data'][feed_key]['elements'] = initial_data

with open(f'./request_sessions/test_response_posts_all_COMBINE.json.json', "w", encoding='utf-8') as json_file:
    json.dump(response, json_file, indent=4)