someshkar / covid19india-cluster

:microscope: Covid19 India Cluster Graph
https://cluster.covid19india.org
MIT License
989 stars 654 forks source link

[data cleaning] Mumbai getting repeated #79

Open answerquest opened 4 years ago

answerquest commented 4 years ago

Describe the bug There are stray data entries where the city is "Mumbai City" or "Mumbai city", "Mumbai Suburb" etc instead of just Mumbai where the majority of entries are.
They are all showing up as separate clusters

To Reproduce Steps to reproduce the behavior:

  1. Go to https://cluster.covid19india.org/
  2. Click on City view
  3. Pull Maharashtra State out to disentangle it and pull its cities out around it
  4. See the multiple cities for Mumbai

Expected behavior One Mumbai

Screenshots mumbai issue cluster covid19india

Desktop (please complete the following information):

Additional context

Redirection

someshkar commented 4 years ago

It's being curated by the covid19india ops team, can you bring this up on the Telegram group? You can find the link on the main website.

answerquest commented 4 years ago

@someshkar thanks for the direction. Telegram group seems to be locked most of the time now. Scaling up issues - seems their numbers exploded too.

someshkar commented 4 years ago

This is getting fixed in #88 Looks good? We're still looking for a general solution though.

the-solipsist commented 4 years ago

The "location" needed for cluster-mapping may differ from the "location" needed for other kinds of data analytics. A generalized solution for cluster-mapping be to use an additional field as "canonical cluster location" (e.g., "Mumbai"), which may be different from the actual location (e.g., "Navi Mumbai"). I would urge against discarding accurate data in the collaborative spreadsheet just to make it fit the needs of cluster-mapping.