pzhaonet / ncov

ncov web
https://ncov2020.org/
MIT License
16 stars 2 forks source link

It's time to include CoV2 data from globle #12

Open chenx77 opened 4 years ago

chenx77 commented 4 years ago

Yes! It's time to include globle data of CoV2. The overall attention is moving from China to the world. The expention of data & model will help globle effort to conquer the virus, and also to help this project more visible globly. Here are some worldwide data available from WHO https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200226-sitrep-37-covid-19.pdf?sfvrsn=6126c0a4_2, but they don't have mathematical model and good visiolization tool yet.

pzhaonet commented 4 years ago

The global data used to be included on our website but removed because of the small numbers. Yes I do wish that I could create daily global reports if I had some time working on it.

yiluheihei commented 4 years ago

I have written a function plot_world_map() which could be used to plot world map for ncov. More details see here

chenx77 commented 4 years ago

Thank you for your swift response and effort! I’ll recommend your world-data page to my friends as soon as they are available on your master webpage at https://ncov2020.org/

Wanqi-Wang commented 4 years ago

I believed that Dr. Zhao did a global data but removed due to some reasons. Yes, I think it's time to include global data. However, I don't believe that a logistic model will work in global setting. Because 1. there is still very small numbers in global data, maybe not like China data, where a logistic model is easiest and most efficient way. 2. the global cases of the disease is highly influence by many factors, e.g., if all flight from china were stopped; the international cooperation with China; the difference level of medical resources in each counties.

If logistic model, I think now maybe you can only predict Japan, Korean, Italy, inaccurately.

thereby a SIR or SEIR model may be better. And you maybe can adjust medical resources by e.g., doctors per population published by WHO. And you can find flights information at traffic analyser: https://www.oag.com/traffic-analyser

pzhaonet commented 4 years ago

I have written a function plot_world_map() which could be used to plot world map for ncov. More details see here

Why not make a PR to the package?

yiluheihei commented 4 years ago

要画省份地图,leafletCN中地图文件太旧了,我把leafletCN一部分文件给更新了,所以我写的代码依赖于

因为我修改后,处于懒,leafletCN 中 api 接口没跟之前的完全保持一致。如果把covr中leafletCN改成我这,covr中原始的函数会受影响,比如 plot_map() 估计会出问题。有3种解决办法

  1. 直接用我写的函数替代 covr 中的 plot_map,但是目前我写的不支持按在中国地图中按城市显示,也就是不支持method = city。因为我后来我画了各个省份的图,就觉得在国家地图中按city展示就没必要了。
  2. 修改 covr 中的 plot_map()
  3. leafletCN 接口保持一致
pzhaonet commented 4 years ago

可以用新的函数名,互不干扰,不影响原有函数。

要是leafletCN的问题,我觉得应该提交个 PR 给 leafletCN 包。

yiluheihei commented 4 years ago

恩,直接把我写的提交上来目前是最快速的

但是我修改了leafletCN,我刚测试了,如果使用我改的leafletCN,你之前写的covr中的plot_map运行不正常。leafletCN我改动比较大,目前还存在些问题,勉强能用,比如没有把城市的地图数据更新,只是把china和各省的进行更新了

要不先这样吧,暂定依赖我改的这个leafletCN,我把我写的函数提交上去,这样各省和global的地图都可以画出来。下一步再把covr中原来的 plot_map我抽空改一下。

先把 covr 跑起来, 最后再考虑 PR leafletCN

pzhaonet commented 4 years ago

我本来想手动把你的世界地图绘图函数合并进来,可是实在没时间做测试。不过我稍微修改了一下原有的代码,给 ncovr::plot_map() 增加了一个 method = 'country',结构比较简单,可能不如你的那个完善,不过好歹可以画世界地图了。修改部分是:

https://github.com/pzhaonet/ncovr/blob/master/R/ncovr.R#L379-L427

最新地图:

https://ncov2020.org/zh/country-map-2020-02-29/

yiluheihei commented 4 years ago

要不在ncovr开个dev分支吧,以免影响ncov运行。

我把我的修改都提交到dev分支

pzhaonet commented 4 years ago

@yiluheihei 都行啊。我觉得还是换个新函数名比较稳妥。

pzhaonet commented 4 years ago

@Wanqi-Wang the patterns of the confirmed cases in other countries seem quite different from that in China. You could find the figures of the time series here. When the logistic model is applied to these data, it fails in most cases, probably because (1) the data itself has another pattern and we have to use another model, or (2) the raw data is dirty, or (3) the cases are not sufficient to fit the model.

We can discuss how to make the data tidy in our seminar time. It would be a good practice to apply SIR or SEIR model to the data. It might be your research project for DPH206, or your FYP. Think about it.

Wanqi-Wang commented 4 years ago

@Wanqi-Wang the patterns of the confirmed cases in other countries seem quite different from that in China. You could find the figures of the time series here. When the logistic model is applied to these data, it fails in most cases, probably because (1) the data itself has another pattern and we have to use another model, or (2) the raw data is dirty, or (3) the cases are not sufficient to fit the model. We can discuss how to make the data tidy in our seminar time. It would be a good practice to apply SIR or SEIR model to the data. It might be your research project for DPH206, or your FYP. Think about it.

Ah, great, i am planning to do it too. BTW, why "2) the raw data is dirty" can be a reason of unsuccessful model, what do you mean by "too dirty", do you mean different variables and data format in different datasets? I agree that it will be a large amount of work in data cleaning , but it would be interesting and valuable work. Looking forward to!!! Thanks!! 😸 😸 😸 😸 😸

chenx77 commented 4 years ago

To overcome problems like data sparseness, dirtiness, and maybe never growing up enough (especially for small countiries), can we make and utilize a data set, that sums up all countires except China. Reasons: 1) Not difficult technically; 2) May be better to fit the same Logistic modle we have already used; 3) Can help readers and experts comparing globel situation against well-known status of China.

Just an idea for your referencing.

Thank you for your great efforts, I watch your report page every day!

pzhaonet commented 4 years ago

@Wanqi-Wang "Dirty" here means the data is created with the inconsistent criteria. For example, the definition of "confirmed" might have different meanings during different periods, even for the same country.

pzhaonet commented 4 years ago

@chenx77 it is worth a shot. Currently I have no time for it. If you make it, your contribution to the website would be appreciated.

Wanqi-Wang commented 4 years ago

@Wanqi-Wang "Dirty" here means the data is created with the inconsistent criteria. For example, the definition of "confirmed" might have different meanings during different periods, even for the same country.

Very interesting point!!! Different diagnosis criteria contributes the "dirty". Thank you!!!! :cat: