ohcnetwork / life-data

https://life-pipeline.coronasafe.network/
1 stars 2 forks source link

Find ways to collect official bed numbers and data from official government sources #4

Open jitendraag opened 3 years ago

jitendraag commented 3 years ago

Is your feature request related to a problem? Please describe. We have a good source of data from various lead generation mechanisms. However, there are official sources that are significantly more reliable and users visiting our website are not getting this data. We should make this data available on our website.

Describe the solution you'd like

Something like Scrapy with custom code for each domain should work.

  1. Get data
  2. Assign unique id to each hospital (so that it doesn't change)
  3. Make this data available as a csv.
  4. Ingest this CSV in our system.

Gurgaon http://covidggn.com/ Delhi https://coviddelhi.com Thane https://covidthane.org/availabiltyOfHospitalBeds.html Bengaluru https://covidbengaluru.com/ Andhra Pradesh https://covidaps.com Telangana https://covidtelangana.com West Bengal https://covidwb.com Pune https://covidpune.com Ahmedabad https://covidamd.com https://ahna.org.in/covid19.html Vadodara https://covidbaroda.com Nagpur http://nsscdcl.org/covidbeds/AvailableHospitals.jsp Nashik https://covidnashik.com Madhya Pradesh https://covidmp.com Uttar Pradesh http://dgmhup.gov.in/en/CovidReport Rajasthan https://covidinfo.rajasthan.gov.in/COVID19HOSPITALBEDSSTATUSSTATE.aspx Bhopal https://bhopalcovidbeds.in/ Haryana https://coronaharyana.in/ Tamil nadu https://covidtnadu.com https://stopcorona.tn.gov.in/beds.php Beed, Maharashtra: https://covidbeed.com Gandhinagar, Gujarat: https://covidgandhinagar.com

Describe alternatives you've considered Skipping this data and updating availability manually are two options.

Additional context NA

ghost commented 3 years ago

Easy to GET Data

Following follows almost same order and structure of data

Special Data GET (Need explicit extraction)

Nagpur GET AND EXTRACT DATA FROM http://nsscdcl.org/covidbeds/AvailableHospitals.jsp

Haryana GET https://coronaharyana.in/

Bhopal GET https://airtable.com/embed/shrh9lZ6z0klMMDAd/tblKVMIewosntdJ0z?backgroundColor=purple&viewControls=on

Uttar Pradesh GET AND EXTRACT DATA FROM http://dgmhup.gov.in/en/CovidReport

Rajasthan GET AND EXTRACT DATA FROM https://covidinfo.rajasthan.gov.in/COVID19HOSPITALBEDSSTATUSSTATE.aspx

hannanabdul55 commented 3 years ago
rakeshgunduka commented 3 years ago

Usefull link: https://english.jagran.com/india/covid19-information-statewise-list-of-official-websites-to-know-the-availability-of-hospital-beds-in-your-city-10026173

Also for Maharashtra: In Mumbai, COVID-19 war rooms have been set up which are taking care of providing information about the availability of beds. The central helpline number is 1916.

divyagar commented 3 years ago

@jitendraag How can we ingest csv to the system?

karmakoder commented 3 years ago

@jitendraag @divyagar @hannanabdul55 @abhinandanarya06 moved this issue to life-data repo. LMK if you guys need write access to repo or if this issue can be combined with another open issue. @rtindru amnd @sam9111 are also working on scraping data initiatives.

jitendraag commented 3 years ago

If you have a CSV endpoint, you can simply get it ingested by tagging Ashiya or me on slack.

ghost commented 3 years ago

Unions of all columns name from all data sources

['available_beds_without_oxygen', 'hospital_name', 'amc_available_beds_with_oxygen', 'district', 'amc_occupied_beds_without_oxygen', 'private_occupied_beds_with_oxygen', 'available_icu_beds_with_ventilator', 'private_available_icu_beds_without_ventilator', 'available_icu_beds_without_ventilator', 'private_occupied_beds_without_oxygen', 'hospital_address', 'amc_occupied_beds_with_oxygen', 'private_available_beds_without_oxygen', 'private_occupied_icu_beds_without_ventilator', 'total_beds_with_oxygen', 'total_beds_without_oxygen', 'amc_available_beds_without_oxygen', 'area', 'total_icu_beds_with_ventilator', 'pincode', 'private_occupied_icu_beds_with_ventilator', 'amc_available_icu_beds_with_ventilator', 'amc_occupied_icu_beds_without_ventilator', 'amc_occupied_icu_beds_with_ventilator', 'available_beds_with_oxygen', 'amc_available_icu_beds_without_ventilator', 'private_available_beds_with_oxygen', 'total_icu_beds_without_ventilator', 'private_available_icu_beds_with_ventilator', 'hospital_phone', 'last_updated_on', 'hospital_poc_phone', 'charges', '__delete__', 'hospital_poc_name', 'hospital_poc_designation', 'total_beds_allocated_to_covid', 'state', 'hospital_category', 'hospital_poc_email', 'facility_id', 'fee_regulated_beds', 'Notes', 'available_beds_allocated_to_covid', 'officer_name', 'officer_designation', 'bed_breakup', 'last_updated_time', 'last_updated_date']

Observation

  1. Some of data sources have 2 columns like (hospital_poc_phone and hospital_phone), (amc_columnname and columnname)
  2. Almost all govt sources have incomplete info about private hospitals like phone, address etc.
  3. some site follow some pattern like covid<state/city-name>.com and i can access delhi status from covidpune.com by GET https://covidpune.com/data/coviddelhi.com/bed_data.json but still the columns are different.

Possible Solution

Atmost we can add links to external govt official sources such that people who don't know can go to that.

Github Branch: scrap Current data csv extracted format sample: data/hospital-ext.csv

Let's discuss it

Really data scraping/mining is so difficult 🤕