spinlud / linkedin-jobs-scraper

147 stars 40 forks source link

Description_duplicate #18

Closed ebouse13 closed 1 year ago

ebouse13 commented 3 years ago

Hi, this is great. I'm trying to convert the results to a dataframe and I am getting a strange bug.

The date, link, title of the job all update fine but the description duplicates in the first and second rows. From the len(description) output, it seems to be happening when the queries are run. Any ideas as to why this is happening?

spinlud commented 3 years ago

Hi there! What do you mean by the description duplicates in the first and second rows? Can you provide an example?

ebouse13 commented 3 years ago

file1.xlsx

Hi, can you see the attached? Basically all data is perfect until you get to the description in the second row - the description is the same as the first. Then there is a knock on impact for the other descriptions in that they match to company/job from previous row.

Let me know if you need further clarity?

Thank you :)

spinlud commented 3 years ago

There are jobs on Linkedin posted several times, with the same description. Have you checked if that could be the case?

ebouse13 commented 3 years ago

Hi, yes I considered that too. However you can see that some of the descriptions contain the company name and the description/company name are out of sync by 1 for the rows after the duplication happens. Not sure why when the data in every other column (Company name, date posted, location) is all correct and then it's just the description has this issue. Must look into the code more to see where the loop happens.

spinlud commented 3 years ago

Does it happen for any query or only for a particular one? Can you share the code of just the query you are doing?

ebouse13 commented 3 years ago

LinkedInQy.zip

I'm actually not sure - the attached is what I have been running.