sericson0 / glassdoor_scraper

scrapes glassdoor
1 stars 1 forks source link

sort by most recent, and review cut-off #1

Open rob3210 opened 4 years ago

rob3210 commented 4 years ago

Thanks Sericson0, this is a great scraper. Would it be possible to update it so that it can sort by most recent? Separately, is it possible to set it up to download all of the reviews? if the comments are too long, it cuts off the review. Thanks again!

sericson0 commented 4 years ago

Hello Rob, I am happy to hear that it is helpful! Much of the code was adapted from https://github.com/MatthewChatham/glassdoor-review-scraper (put this together to help a friend with some research and was lazy with citing in the actual code, so want to make sure to give credit where credit is due :-) )

-If you change the parameter "Limit" it will scrape reviews up to the number of reviews set by Limit. Just note that as the code is set up right now Limit must be less than the total number of reviews. This should be a quick update to be able to scrape all reviews. Or did I understand the question wrong? Is it cutting off the number of comments or is it cutting off a long comment? If is the latter then I will make sure to look into why that is happening.

-Updating to sort by most recent will just require getting selenium to click on the sort button and changing to "Most Recent". Will take a bit of fiddling but shouldn't be too difficult. Will try and update the code in the next week or so. -Sean

On Wed, Nov 6, 2019 at 8:12 PM rob3210 notifications@github.com wrote:

Thanks Sericson0, this is a great scraper. Would it be possible to update it so that it can sort by most recent? Separately, is it possible to set it up to download all of the reviews? if the comments are too long, it cuts off the review. Thanks again!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sericson0/glassdoor_scraper/issues/1?email_source=notifications&email_token=AFZ4F6ERFXSKTUANK5QRJGTQSOBRPA5CNFSM4JKAY7EKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HXPQJXA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFZ4F6G3NLTXCBKAQRBLW2LQSOBRPANCNFSM4JKAY7EA .

sericson0 commented 4 years ago

I just added the ability to sort by Most Recent (Also by Popularity, Highest and Lowest Rating and by Oldest First). Most recent can be selected by setting SORT_OPTION = "Most Recent" Please let me know if this works for you

rob3210 commented 4 years ago

Nice!! I’ve been trying to figure out how to get it to stop cutting off comments when they are extra-long and have a “more” button that is required to click to expand the comment. I haven’t yet been able to figure it out. Will let you know if I’m able to solve!

sericson0 commented 4 years ago

We just need to check if the get more button exists and if so click it. Could you send me an example review (company+page number) which is extra long? I'll write something up

rob3210 commented 4 years ago

there is one on first page of apple page, sorted by most recent. https://www.glassdoor.com/Reviews/Apple-Reviews-E1138.htm

I'm unable to get it to sort by "Most Recent." if i change it to most recent, it prints that it is sorting by most recent, but then just returns most popular.

i'm thinking selenium is having trouble with the drop down? when i inspect the page they title the selection / option value for "Most Recent" as "DATE,"