public-salaries / public_salaries

Public sector employee salaries
16 stars 1 forks source link

Pennsylvania Monthly Salary Data (2012--2017) #2

Open soodoku opened 7 years ago

soodoku commented 7 years ago

Source =

http://pennwatch.pa.gov/employees/Pages/Employee-Salaries.aspx

To scrape, use the form to subset salary by $0---$25,000 etc.

Add columns indicating year and month to each row scraped. Also add department to each row. It comes up as a title.

ChrisMuir commented 6 years ago

FYI, the service at the link in your post is unavailable, see screenshot. No idea if it's just temporarily down, but figured I'd document it.

penn_state_website

soodoku commented 6 years ago

temporary outage. i can see it.

ChrisMuir commented 6 years ago

I manually pulled the latest PDF's from 2017-11-15 and wrote a script to read them all in, extract the data, merge into a single data frame, and write to csv. Next step I will write a scraper to automate the process of pulling all PDF's from all time frames, and try to apply the aggregation script to all of the PDF's.

I pushed the 2017-11-15 data (raw PDF's and 7z of output df) and aggregation script to the repo.

soodoku commented 6 years ago

Nice man! Really cool! :-)

soodoku commented 6 years ago

hey @ChrisMuir --- should we close this issue?

ChrisMuir commented 6 years ago

Ah, I haven't yet completed the next step (write script to pull all of the PDF's from the website). We have a plan in place, but I don't know if that's enough to close this issue or if you want to wait until all of the PA work has been completed...it's up to you.

soodoku commented 6 years ago

Thanks, man! Let's wait.