themousepotato / unscrapulous

A utility that scrapes lists of unscrupulous entities (barred from doing financial business) published by various legal institutions
MIT License
6 stars 1 forks source link

Income Tax Defaulters fails to work #9

Open themousepotato opened 3 years ago

themousepotato commented 3 years ago

Income Tax Defaulters scrapers used to work with the source http://office.incometaxindia.gov.in/administration/_layouts/15/inplview.aspx?List={5A26177B-D7A0-4251-843D-5E6C0B3C3DF2}&View={D8DD9754-8FD1-4D72-9908-727646E99CA0}&ViewCount=450&IsXslView=TRUE&IsCSR=TRUE&Paged=TRUE&p_ID=1. But, now it fails because of unknown reasons.

knadh commented 3 years ago

What about these pages? http://office.incometaxindia.gov.in/administration/Pages/tax-defaulters.aspx

http://office.incometaxindia.gov.in/administration/Lists/Tax%20Defaulters/AllItems.aspx?Paged=TRUE&p_Tax_x0020_Arrear=11.9300000000000&p_ID=45&PageFirstRow=31&SortField=Tax%5Fx0020%5FArrear&SortDir=Asc&&View=%7BD8DD9754-8FD1-4D72-9908-727646E99CA0%7D

themousepotato commented 3 years ago

I've seen those pages. But, was having trouble scraping. @rhnvrm had written the main contents of the current method. Will try to investigate this.

rhnvrm commented 3 years ago

I'd somehow read the generic docs of ms-sharepoint aspx sites from somewhere and found out that link, but it seems to be throwing 403. Maybe they audited that people were using it somehow. I guess the best bet would be to use headless chrome on these now and scrape the data.