sfrosenb / sfrosenb-Immunization_Strategies_in_Networks_with_Missing_Data

Code and data associated with the upcoming PLoS Comp Bio paper about targeted immunization strategies using incomplete network data
MIT License
5 stars 3 forks source link

Potential Alternatives for Spreading_CR package? #1

Open Frankie91 opened 4 years ago

Frankie91 commented 4 years ago

Dear Samuel,

Togheter with other students I'm currently trying to reproduce and extend part of your study. We're however currently unable to solve a series of issues related to the Spreading_CR package and, although we're already in touch with its author, we were also wondering if it would be possible to instead rely on alternative similar packages, such as EoN and Epydemics, which still use on Gillespie algorithms for the simulation of the spreading processes on networks, albeit without using a composition and rejection scheme for sampling. Thank you in advance for your attention.

Best Regards, Daniele Francario

sfrosenb commented 4 years ago

Dear Daniele,

Wow! That's exciting that you are extending the study! There is so much more I want to do down this path of missing data in targeted immunization. I am currently working on extending and flushing out the results of the "updating sample" variant of acquaintance immunization to find out why, and in what scenarios the phenomenon of the improvement with missing data occurs. What are you looking at in your extension with your colleagues? Different ways the data can be missing? Different kinds of networks? Different immunization strategies? I'd love to help out in any way that I can. I am excited that you are interested!

I agree with you that the choice of the specific software for the epidemic simulations is somewhat arbitrary. The Spreading_CR package is exceptionally fast, which allowed us to utilize such large sample sizes across such a wide range of network types and parameters. However, in theory, any package which uses Gillespie algorithms for the simulations should give you the same results, and in fact any package which simulates an SIR model of any kind (including a discrete model instead of a continuous model) should give you results which agree with ours qualitatively (though the specifics may be slightly different). In fact, when I began this project a number of years ago we started with a discrete SIR model.

What might work for you is moving forward is using the EoN package as you build your study for testing, while also trouble shooting the Spreading_CR package for if you want to expand your sample size and parameter range or increase the speed of your testing. I'd also be happy to help you trouble shoot the Spreading_CR package if you'd like. I had a bit of trouble installing it and using it in the beginning myself. Are you having trouble installing it? Or are you hitting an error using it? If there is an error using it, are your nodes labeled as integers from 0 to N? I also personally know the author of the Spreading_CR package so let me know if he doesn't get back to you and I can ask him myself.

Best,

Sam


From: Frankie91 notifications@github.com Sent: Tuesday, August 11, 2020 11:23:07 AM To: sfrosenb/sfrosenb-Immunization_Strategies_in_Networks_with_Missing_Data Cc: Subscribed Subject: [sfrosenb/sfrosenb-Immunization_Strategies_in_Networks_with_Missing_Data] Potential Replacements for Spreading_CR package? (#1)

Dear Samuel,

Togheter with other students I'm currently trying to reproduce and extend part of your study. We're however currently unable to solve a series of issues related to the Spreading_CR package and, although we're already in touch with its author, we were also wondering if it would be possible to instead rely on alternative similar packages, such as EoN and Epydemics, which still use on Gillespie algorithms for the simulation of the spreading processes on networks, albeit without using a composition and rejection scheme for sampling. Thank you in advance for your attention.

Best Regards, Daniele Francario

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/sfrosenb/sfrosenb-Immunization_Strategies_in_Networks_with_Missing_Data/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKGUAYJK4TLIYGTE4L7QMULSAFO5XANCNFSM4P3FEAMA.

Frankie91 commented 4 years ago

Dear Samuel,

First of all, thank you for such an enthustiastic answer! Our goals are for now "limited" (since we've been tasked to work on this extension as part of an assignment for a graduate-level course in network analysis) to the consideration of the impact of missing data on eigenvector and pagepank immunization strategies, on one side, and to its quantification by using the measurement for robustness proposed by Martin et al. in the following paper on the other:

https://www.cambridge.org/core/journals/network-science/article/influence-of-measurement-errors-on-networks-estimating-the-robustness-of-centrality-measures/80DEBC5D57537B2244C4DBBB5AE0030E.

But the topic definitely sparks my interest, so I'll keep you posted if I manage to keep moving beyond these in the near future.

Coming back to the present issues with Spreading_CR, as you can see in our ongoing discussion with Guillaume (https://github.com/gstonge/spreading_CR/issues/14), that is also already trying to help us, these are for now still related only to the package's installation on our Windows machines. I believe both me and my colleague have correctly followed the required steps for installation, but we remain unable to pinpoint the cause of the subsequent errors. So we would definitely like your help on this!

Best, Daniele

sfrosenb commented 4 years ago

Dear Daniele,

Sounds like a great class project! I'm still happy to help. There is alot of work to be done in this realm as there has been so much work on targeted immunization without considering the realities of missing data, and so there is alot of targeted immunization research of the past two decades which would benefit from re-investigation in this new light. I think starting by investigating some other commonly used strategies like pagerank and eigenvector centrality is a very good and manageable course project, especially since both measures can be calculated using the readily available packages, networkx and igraph, and tying in the Martin and Niemeyer paper is a fantastic idea. I think you will really learn alot. That seems like a very good start to a paper that your professor may very well be interested in writing with you! Maybe that was part of the reason they assigned such a new piece of research as a project?

Anyway, I spoke to Guillaume and I am happy he is helping you install the Spreading_CR package. Since Spreading_CR is quite a new package, it seems you are the first to try to install it on a Windows machine. This is a great chance for him and I to help debug your problem so we can tell people what to do in the future, but that is not so great for you since I am sure you have deadlines to meet and would like to get to the fun part. Based on his most recent response in that issue thread (https://github.com/gstonge/spreading_CR/issues/14), it seems he may have provided updated installation instructions that might work for you, which would be great. Have you tried those? I am not sure I personally will be much help with the installation process as I am not particularly great at package management myself and I also don't have a windows machine. I noticed in that thread with Guillaume that you mentioned you were new to Python. You will find out soon that package management in Python can be quite confusing, especially when C is involved under the hood.

If you are not getting anywhere with Guillaume's advice and updates, there are few other good options. First of all, you could do as you suggested in your original question and replace the parts of the code that use Spreading_CR with another package. I personally would recommend EoN since I am familiar with it and like it. This would certainly work for a school project as I expect you will be investigating a much smaller set of parameters than we did for our paper, and so while EoN is slower than Spreading_CR, it should still be plenty fast for what you need it for. If you'd like, I can check out your updated code once you have replaced all the Spreading_CR stuff with EoN stuff to make sure it is still doing the same thing. One small issue with that however is that I do not believe that EoN has a function that is equivalent to SpreadingProcess.estimate_R0() which is used on line 129 of run_sims.py (R0_empirical_mean, R0_empirical_std = sp.estimate_R0(numTrialsToEstR0, seed) ) to estimate the basic reproductive number of that network with those parameters for the SIR model. However, estimating R0 is not actually an essential part of the paper and was just used to provide context to readers more familiar with that measure, and so omitting that part of the code would be fine for your purposes I believe.

The other option would be to bypass the issue of using a windows machine by creating a linux virtual machine on your computer and doing everything on there. I personally had to do this in order to use the package graph-tool on my computer, as the package management for graph-tool is a nightmare so I just made a virtual linux machine and downloaded Anaconda on that and then got graph-tool running on there. If you are interested in this option you can check out this tutorial: https://brb.nci.nih.gov/seqtools/installUbuntu.html

Best, Sam