Open yashy3nugu opened 1 year ago
Deduplicate the diff please. Something like:
-GREATER MUMBAI
+Mumbai
Don't need to know the remaining fields, just a unique list of districts impacted by the change
What's the coverage of the change? (How many districts are matched, and left unmatched?)
What's the coverage of the change? (How many districts are matched, and left unmatched?)
Only 23 out of 16320 districts are left unmatched and required manual patches. Rest of the districts matched with the list
Deduplicate the diff please. Something like:
-GREATER MUMBAI +Mumbai
Don't need to know the remaining fields, just a unique list of districts impacted by the change
Updated the gist
The changes are too aggressive.
-NEAR NEW MONDHA (ANAJ MANDI) HINGOLI
+Gandhinagar
-IN FRONT OF KANYASHALA
+Kalahandi
-RAVI STEEL CHOWK, KAMRE, RATU ROAD
+Amravati
-BLOCK- KANDHLA, DIST - SHAMLI
-NAI BAZAR, BHARWARI
+Hazaribagh
-PATTI, PAKHWANIA
+Panipat
-PO-AKHAR, DUDHER
+Dhar
-LEFT BANK, ALEU, NEW MANALI, DISTT - KULLU
+Dibang Valley
-TAL JAWHAR DISTT THANE
+Jalandhar
-NIKETAN ASHRAM, DISTT. PAURI
+Amristar
Don't think we can merge this till we're sure about the accuracy of the data.
In the meanwhile, I've found a nice source for an official list of districts india, with district codes that we can perhaps use. https://lgdirectory.gov.in/. Here's a cleaned up version: https://github.com/planemad/india-local-government-directory/blob/main/administrative/2-district.csv
It's missing a few districts, I've filed a PR for that.
Updates the 'DISTRICT' field using fuzzy matching to match the closest standardized district name. District names are taken from here The changes made to the dataset can be seen here