stamarty / WordCounter

Scan a file and count the number of times each word shows up.
0 stars 2 forks source link

Want to add web scraping. #5

Closed stamarty closed 5 years ago

stamarty commented 5 years ago

Just like it sounds. I want to do a search on a job search site and have the program scrape the keywords then move to the next posting.

Kunal614 commented 5 years ago

What do you want , linking with linkedin so that you get data of any job , or anything else?

stamarty commented 5 years ago

What do you want , linking with linkedin so that you get data of any job , or anything else?

I was thinking specifically scraping from glassdoor, but linkedIn job pages would be fine too. I'm specifically looking to pull the job descriptions from the pages.

stamarty commented 5 years ago

This spot here This is the block here, that I'm talking about. I want to be able to pull this data into the text reader and count the frequency of the words showing up.

Kunal614 commented 5 years ago

can you share the block link?

stamarty commented 5 years ago

I'm not certain I understand the term "block link". Do you mean div id? If so, here's a screenshot I hope will help. block link

Kunal614 commented 5 years ago

no no no ... Where is this block available

On Tue, 5 Nov 2019 at 23:27, Steven A. Martinez notifications@github.com wrote:

I'm not certain I understand the term "block link". Do you mean div id? If so, here's a screenshot I hope will help. [image: block link] https://user-images.githubusercontent.com/7009764/68232761-a3454c00-ffb2-11e9-8397-b15e921d171f.JPG

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stamarty/WordCounter/issues/5?email_source=notifications&email_token=AKPOV4PFFZVU7POQF7XOTFTQSGXXXA5CNFSM4I2G7R22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDDWZCY#issuecomment-549940363, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKPOV4KKEEP3YWKP4FGOTYTQSGXXXANCNFSM4I2G7R2Q .

stamarty commented 5 years ago

Oh, that's at https://www.glassdoor.com/index.htm

Kunal614 commented 5 years ago

I need your help , As i scrapped data from the given site but , i have a query that how to calculate each and every word frequency? can you help me .

On Tue, 5 Nov 2019 at 23:54, Steven A. Martinez notifications@github.com wrote:

Oh, that's at https://www.glassdoor.com/index.htm http://glassdoor

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stamarty/WordCounter/issues/5?email_source=notifications&email_token=AKPOV4KUUKWNPWLWX7DIIV3QSG24PA5CNFSM4I2G7R22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDD2BOQ#issuecomment-549953722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKPOV4PLL5DCPQ2E3BQ7CDLQSG24PANCNFSM4I2G7R2Q .

stamarty commented 5 years ago

Is the word counter not working? What I'm hearing here is that you were able to scrape the site, but you don't know how to use the word count functionality?

Are you following the directions in the readme? If so, what error are you getting so I can look into it. Thanks!

Kunal614 commented 5 years ago

OK I will show you soon.

On Thu 7 Nov, 2019, 8:04 AM Steven A. Martinez, notifications@github.com wrote:

Is the word counter not working? What I'm hearing here is that you were able to scrape the site, but you don't know how to use the word count functionality?

Are you following the directions in the readme? If so, what error are you getting so I can look into it. Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stamarty/WordCounter/issues/5?email_source=notifications&email_token=AKPOV4IZFWL7TQCUOH27RTTQSN5DPA5CNFSM4I2G7R22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIWFLA#issuecomment-550593196, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKPOV4IRFVFTZ4PIBA3ORLLQSN5DPANCNFSM4I2G7R2Q .

stamarty commented 5 years ago

That would be great! I'm not certain what your are asking me so any explanation would be helpful.

Kunal614 commented 5 years ago

no no i understood.

On Thu, 7 Nov 2019 at 11:37, Steven A. Martinez notifications@github.com wrote:

That would be great! I'm not certain what your are asking me so any explanation would be helpful.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stamarty/WordCounter/issues/5?email_source=notifications&email_token=AKPOV4OKR7UFEVYKA3PLTBLQSOWBHA5CNFSM4I2G7R22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDKVSPY#issuecomment-550852927, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKPOV4LT3ZK6UVWC6SVLVSLQSOWBHANCNFSM4I2G7R2Q .

Kunal614 commented 5 years ago

https://www.glassdoor.com/index.htm http://glassdoor

This link is not working , on click it says trouble in finding this site

On Wed, 13 Nov 2019 at 00:04, kunal Kumar Barman kumar96kunal@gmail.com wrote:

no no i understood.

On Thu, 7 Nov 2019 at 11:37, Steven A. Martinez notifications@github.com wrote:

That would be great! I'm not certain what your are asking me so any explanation would be helpful.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stamarty/WordCounter/issues/5?email_source=notifications&email_token=AKPOV4OKR7UFEVYKA3PLTBLQSOWBHA5CNFSM4I2G7R22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDKVSPY#issuecomment-550852927, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKPOV4LT3ZK6UVWC6SVLVSLQSOWBHANCNFSM4I2G7R2Q .

stamarty commented 5 years ago

https://www.glassdoor.com/ works fine for me. I'm not sure why you're having problems.

Kunal614 commented 5 years ago

Sir here extracting of data's are not allowed , can i give you the code of counting frequency of some different site?

[

type Status report

,

message Bots not allowed

,

description Access to the specified resource has been forbidden .

] This comes.

On Wed, 13 Nov 2019 at 09:23, Steven A. Martinez notifications@github.com wrote:

https://www.glassdoor.com/ works fine for me. I'm not sure why you're having problems.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stamarty/WordCounter/issues/5?email_source=notifications&email_token=AKPOV4MANUHTB4XO6IA7SU3QTN25BA5CNFSM4I2G7R22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED4ZITI#issuecomment-553227341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKPOV4OU6W4PHA5QNRDU6VLQTN25BANCNFSM4I2G7R2Q .

stamarty commented 5 years ago

Oh, interesting. I didn't know think that was there. You can use any aggregate job site. https://www.linkedin.com/jobs/search/ https://www.indeed.com/jobs?q&l=Coquille%2C%20OR&ts=1571848911055&rq=1&rsIdx=1&fromage=last&newcount=163&advn=862219100656716&vjk=776ae8c79b223017

Anything like that.

Kunal614 commented 5 years ago

sir , work completed please send me the link where i will do a pull request ?

On Thu, 14 Nov 2019 at 21:44, Steven A. Martinez notifications@github.com wrote:

Oh, interesting. I didn't know think that was there. You can use any aggregate job site. https://www.linkedin.com/jobs/search/

https://www.indeed.com/jobs?q&l=Coquille%2C%20OR&ts=1571848911055&rq=1&rsIdx=1&fromage=last&newcount=163&advn=862219100656716&vjk=776ae8c79b223017

Anything like that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stamarty/WordCounter/issues/5?email_source=notifications&email_token=AKPOV4L2CH6A7BWXM2PGCUTQTV2OPA5CNFSM4I2G7R22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEECMFHA#issuecomment-553960092, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKPOV4JMII6X4DNUNWFJJ4LQTV2OPANCNFSM4I2G7R2Q .

Kunal614 commented 5 years ago

pull request done
link -> https://github.com/Kunal614/WordCounter/blob/master/count_freq_job.py

stamarty commented 5 years ago

Thanks. Merged!

Kunal614 commented 5 years ago

welcome , Ready for more.

On Mon, 18 Nov 2019 at 20:37, Steven A. Martinez notifications@github.com wrote:

Thanks. Merged!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stamarty/WordCounter/issues/5?email_source=notifications&email_token=AKPOV4OHCRAPOEE6Z2VSJDTQUKVTRA5CNFSM4I2G7R22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEKXVMY#issuecomment-555055795, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKPOV4JUQ3MQTMWJZQLHZ5TQUKVTRANCNFSM4I2G7R2Q .