Open Tabish-Invo opened 2 years ago
Hola. Me puedes ayudar en como estas utilizando estos métodos para la extracción de esos 20 resultados, te lo agradecería.
I have the same problem, are you using verison 1.0.0 or 2.0.0?
@manuelrech I have tried this with both 1.0.0 and 2.0.0 and I can't figure it out. I also added
linkedin.py, line 928
res = self._fetch(f"/messaging/conversations?start=100", params=params)
which then bears this response as paging
:
'paging': {'count': 20, 'start': 100, 'links': []}}
Unfortunately, it still contains the first 20 elements even though it claims to start at 100...
i know, through API I managed to get the last 20 also. However, I thought about using selenium to get the conversation_urns by interacting with the webpage of linkedin. Here I also use a piece of javascript code to move down the sidebar and therefore not to let this process run forever, I used the month as a stopping criterion
from selenium import webdriver
from linkedin_api import Linkedin
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support import expected_conditions as EC
def linkedin_login(username, password):
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
##### LOGIN SESSION #####
driver.get("https://www.linkedin.com")
username_space = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'session_key')))
password_space = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'session_password')))
username_space.send_keys(username)
password_space.send_keys(password)
accedi_button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'sign-in-form__submit-button')))
accedi_button.click()
return driver
def getting_conversation_urns(driver, stopping_month = 'Nov'):
# to find what values can stopping month take, go into the message thread and see what is the corresponding label
driver.get('https://www.linkedin.com/messaging/?')
##### RETRIEVING CONVERSATION URNS #####
conversation_urns = []
time_not_too_far = True
while time_not_too_far:
conversations = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, '/html/body/div[5]/div[3]/div[2]/div/div/main/div/section[1]/div[2]/ul/li/div/a')))
for conversation in conversations:
conversation_urn = conversation.get_attribute("href").split('/')[-2]
if conversation_urn not in conversation_urns:
conversation_urns.append(conversation_urn)
time = WebDriverWait(conversation, 10).until(EC.visibility_of_element_located((By.XPATH, 'div[2]/div/div[1]/time'))).text
if stopping_month in time: ##### conditioning on time
time_not_too_far = False
driver.execute_script("return arguments[0].scrollIntoView();", conversations[-1])
print('stopped when date was: ' + time)
return driver, conversation_urns
driver = linkedin_login(YOUR_EMAIL, YOUR PASSWORD)
driver, conversation_urns = getting_conversation_urns(driver, 'Nov')
Now you can use the get_conversation(conversation_urn)
method to get the conversation.
For me this tricks works, but you need to know a little of selenium, I hope this helps!
In the end I found a solution that works for me in this issue: https://github.com/tomquirk/linkedin-api/issues/46
You can pass unix timestamp as the parameter created_before
into the function. I was actually wondering if to create a PR to include this.
I am guessing that this same logic applies for a couple of other endpoints, too.
I would really appreciate that, or in case you could post how you modified the function definition to include the parameter
I will have time to make a PR tomorrow. Did you manage to implement it locally?
Yes i did as on issue #46, getting batches of 20 and starting the next from createdBefore = conversations['elements'][19]['events'][0]['createdAt']
of the previous batch, as @AchatY suggested in the issue.
Thank you
get_conversations and get_conversation both return only the 20 first results.