lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
109 stars 44 forks source link

[New] [Visualization] Add colored map methods for Global, USA (states), China, Japan (prefectures) per variable #535

Closed Inglezos closed 3 years ago

Inglezos commented 3 years ago

Summary of this new feature

I see that there is a jpn_map() method that shows colored Japan prefecture map and I see in the kaggle notebook that there is the following code in order to plot the map for the Infected cases for example:

df = jpn_pref_df.copy()
df["Infected"] = df["Positive"] - df["Discharged"] - df["Fatal"]
df = df.pivot_table(
    index="Date", columns="Prefecture", values="Infected", aggfunc="last"
)
jpn_i_df = df.sort_values(by=df.index[-1], axis=1, ascending=False)
cs.line_plot(
    jpn_i_df.iloc[:, :10],
    "top 10 prefectures in Japan: Infected cases over time",
    y_integer=True
)
  1. This japan map method shall be reworked to accept as input the desired variable to plot as colored map, from the variables set [Confirmed, Infected, Fatal, Recovered] (only for the current value) (the number of prefectures will be an extra argument, i.e. top 5 in population).

  2. Similar method shall be created for China provinces (the number of provinces will be an extra argument, i.e. top 5 in population).

  3. Similar method shall be created for USA states (the number of states will be an extra argument, i.e. top 5 in population).

  4. A general method shall be created for global colored map that will show the current variables set, for example show_global_map("Confirmed") which will show a global colored map for the currently total confirmed cases.

lisphilar commented 3 years ago

I created jpn_map() function when I did not have this GitHub repository, and pasted this function here for version 1.0.0 or early release. After that, I did not maintenance it.

Yes, it will be great to create visualization methods/functions with map etc. We did not have time to enhance visialization tools without .line_plot() function, but it is required to understand datasets more deeply.

lisphilar commented 3 years ago

Useful packages for map/animation:

Inglezos commented 3 years ago

Do you think this is achievable for v2.15.0 release or does this include a lot of effort? I am not familiar with the map/animation packages, but if you want I could assist you, I will have some time in the weekend.

lisphilar commented 3 years ago

Thank you. Because this issue needs a lot of effort, I will move this issue to v2.16.0 release milestone. I think we will create some classes for the enhanced plotting system, including line plot (new issue) and maps. We have line_plot() function, but this has too many arguments for users to controll effectively.

So, it will be better

I plan to release v2.15.0 on 17Jan2021, but it could be on 16Jan2021 if we will complete the issues.

lisphilar commented 3 years ago

Cartopy package will be useful to add values at country/province level data as points. https://scitools.org.uk/cartopy/docs/latest/gallery/eyja_volcano.html#sphx-glr-gallery-eyja-volcano-py

Geo data (latitude and longitude) could be retrieved from COVID-19 Data Hub.

lisphilar commented 3 years ago

@rebeccadavidsson , Do you have any ideas regarding data visualization with global map and province level map?

Inglezos commented 3 years ago

I am adding here also my first attempt at this: https://colab.research.google.com/gist/Inglezos/765656e5727569d907a9f26239e486e1/covsirphy_plot_colored_maps_dev.ipynb

I hope this to be useful for development. I focused on global data, but I guess province/state level would be easy, I didn't have time to try this as well. Also reworks need to be done in legend format and missing data display. Titles must be included.

lisphilar commented 3 years ago

Thank you for your notebook! I tried to update your notebook. https://gist.github.com/lisphilar/ce11a208d669c3ad769e0d9ec51080ec

gpd.read_file(gpd.datasets.get_path("naturalearth_lowres")) has country level geometry data. I'm serching for prpvonce level geometry data. Ref. https://geopandas.org/mapping.html

For legend issue, non-discrete legend (line 13 in the notebook) may be useful, but only blue color was shown the output figure at this time.

lisphilar commented 3 years ago

Colored map for USA states: https://gist.github.com/lisphilar/72818afa3208d9d7401d7aa6571d41e3

This works for USA, but I could not find province-level geometry data for all countries at this time.

lisphilar commented 3 years ago

Solution for Japan and China: https://gist.github.com/lisphilar/57aea6357c8d06402560f89c5aa318da

lisphilar commented 3 years ago

With #590, JHUData.map() was added.

Related issues

535

What was changed

Global map with country level data:

# Global map with country level data
jhu_data.map(country=None, variable="Infected")
# To set included/exclude some countries
jhu_data.map(country=None, variable="Infected", included=["Japan"])
jhu_data.map(country=None, variable="Infected", excluded=["Japan"])
# To change the date
jhu_data.map(country=None, variable="Infected", date="01Oct2021")

Country map with province level data:

# Country map with province level data
jhu_data.map(country="Japan", variable="Infected")
# To set included/exclude some countries
jhu_data.map(country="Japan", variable="Infected", included=["Tokyo"])
jhu_data.map(country="Japan", variable="Infected", excluded=["Tokyo"])
# To change the date
jhu_data.map(country="Japan", variable="Infected", date="01Oct2021")

This will be documented in https://lisphilar.github.io/covid19-sir/usage_dataset.html#The-number-of-cases-(JHU-style)

lisphilar commented 3 years ago
jhu_data.map(country=None, variable="Infected")

Figure_1

jhu_data.map(country="Japan", variable="Infected")

Figure_1

lisphilar commented 3 years ago

Remained issue: Global/Country map: PCRData, PopulationData Global map: OxCGRTData, VaccineData Country map: CountryData

lisphilar commented 3 years ago

With #592, global map was updated. Figure_1

lisphilar commented 3 years ago

Close for new release. Discussion will be contiuned in #593 and #594.

Inglezos commented 3 years ago

Thank you for the implementation! Sorry for being away the last few weeks, I was busy at work with tight schedule. Regarding the visualization, I have some notes/questions:

  1. I think it would be better to put the legend below the image, in order to free up more space for the image to be larger. In colored_map.py -> plot(), I would suggest the following changes: i. cax = divider.append_axes("bottom", size="5%", pad=0.1) ii. plot_kwargs["legend_kwds"] = {'orientation': "horizontal"} This gives for example: colored_map_global_confirmed

This could be also applied to all the cases plots as well, for example in plotting.py -> line_plot():

if show_legend:
    ax.legend(bbox_to_anchor=(0.5, -0.1), loc='lower center', borderaxespad=0, ncol=len(df.columns))

This gives for example: image Should I create a new issue for that?

  1. For China I try: jhu_data.map(country="China", variable="Confirmed") but the result is: colored_map_china_confirmed Why are there missing regions denoted with /// boundaries ?

  2. For Japan I see there are many small islands all around: colored_map_japan_confirmed Do you know how can we get only the mainland, besides manually excluding each island?

  3. Greenland is shown as missing, but actually has very low number of cases (30). This problem originates from source data because incovid19dh.csv the record is "Country" : "Denmark" with "Province" : "Greenland", which means that the cases from Greenland are accounted for Denmark's if I understand correctly. image

lisphilar commented 3 years ago

Thank you for your time! Because this feature was released with stable versions, discussions will be continued in new issues.

  1. Location of legend labels Nice idea! Could you create issues for updating ColoredMap, line_plot/line_plot_multiple and box_plot? For line_plot, the distance of x-axis and labels could be larger with adjusting bbox_to_anchor.

  2. Regions with missing values in China The denoted regions are "Inner Mongolia" and "Tibet". They are registed as "Inner Mongol" and ”Xizang". (Map with English names: https://www.chinasage.info/provinces.htm) I do not know the correct names, but this issue could be solved in CovsirPhy technically by changing these names in ColoredMap._load_geo_country_specific(). Additionally, "Hong Kong" and "Macau" are registered as provinces in JHU dataset, but they seems not registered as provinces in geometry data we use at this time. Do you have any ideas to solve this issue (programming issue and political correctness)?

Mismatch of province name is a common problem for many countries, including France and UK. A new dataset (geometry information) or a new package (resolve province name issue, like country_converter for country names) will be necessary.

  1. Smaller islands in Japan Only for Japan, japanmap package can slove this issue by moving Okinawa and Hokkaido on the map as shown in covsirphy.visualization.japan_map.jpn_map() function. (As a general problem, smaller islands could be ignored with Geopandas package, but smaller provinces may be removed completely.)

  2. Greenland is missing in global map Greenland is not registered in the geometry information for country specific map (_load_geo_country_specific) as well. If OK, Greenland (self‐governing territory) could be registered as a country/region.