Open CescGV opened 7 months ago
Hi and sorry for late reply. I'm currently busy with other projects but will catch up on this project as soon as possible.
Some of the advanced options are experimental and everything should be cleaned up and explained more in the app, but some quick answers on your questions for now:
Variable Markov time and Markov time Infomap tends to break large sparse network structures into multiple modules as they take a long time for a random walker to traverse. This may not be wanted in all situations, and an existing solution is to increase the Markov time, a global rescaling of the dynamics that keeps larger structures together. However, that comes with a cost of reduced resolution, so it may not detect smaller modules in denser areas. With Variable Markov time, it can increase the Markov time in sparse areas and keep it at the default value in dense ones, thus increasing the dynamic range of the size of modules it can detect.
Details are explained here: https://arxiv.org/abs/2211.04287. It has a section on the effect in Infomap Bioregions.
State network A state network is a network where each physical node contains state nodes that direct the flow depending on from which other physical nodes the flow came from, i.e. it encodes dynamics with memory. More details here for example: https://www.mapequation.org/publications.html#Edler-Etal-2017-MappingHigherOrder
Patch sparse cells With adaptive resolution, a grid cell is subdivided into four sub-cells if it has enough data. However, there may not be enough data points in all four sub-cells, in which those become empty, creating holes and a fragmented map of bioregions. With patch sparse cells, the parent cell is not removed if not all sub-cells have enough data.
Render data while zooming It may be slow to zoom if it has to re-render all data while zooming, and by toggle this option off you can increase the performance.
Hope this helps, and please reply if you have more questions. I will try to reply quicker.
Thank you very much! Actually I do:
1 - I am having some problems obtaining neat results, but part of it may be because of the data quality. I want to use global marine data to calculate bioregions. However, I cannot get enough flexibility. If I set the very high resolution (markov time a bit lower, but dumping higher) I get better resolution at the coasts and more realistic bioregion boundaries. However, the pelagic realm (most of the oceans) becomes all messy and with hundreds of realms. It seems like to get enough resolution in some areas, the algorithm collapses for the other areas, and gives me one realm per cell. This happens even if I set the dumping to 3.0. Any idea why this could be?
2 - Also, to be able to track all my tests, I now tried to use the Python API. I managed to run the algorithm, by downloading the network from Infomaps and using it as the input for Python Infomp. Here the problem is that I am not achieving the same results as in Infomap, although I get similar ones. Are the default parameters in the webpage (https://www.mapequation.org/bioregions2/) the original default values described in GitHub? (https://mapequation.github.io/infomap/python/infomap.html#infomap.Infomap) Some of the advanced parameters on the webpage seem to have a direct parameter in the Python function, but others are not so direct, right?
Thank you and all the best!!
Interesting questions.
By lowering the Markov time, it's easy to overpartition a network. If you need to lower the Markov time to find a better resolution at the coasts, maybe you have some widespread species indirectly connecting all grid cells that are merged to one bioregion in the default setting? In that case, I would instead play with the "rarity strength" parameter (I may rename it later) where an increased value gives relatively more strength to narrowly distributed species.
They should have the same default parameters, and parameters in Infomap Bioregions that doesn't exist in Infomap such as rarity strength changes the link weights so that should be preserved when downloading the network. However, the random number generator is not the same so they may give different solutions depending on that. Try increasing the number of trials to maybe 100 and see if you get the same solution.
Thank you, this is such a big help! The results are already looking so much better :). May I ask another question? The "summary" output, with the 10 most representative species in each cluster, how could I derive it from the Python function Infomap()?
Hello and thanks for such a cool tool.
Is there a source for understanding all the advanced parameters? What does the variable markov time do? I have a lot of noise in my results and I am playing with several parameters, but for some of them I don't know what I am doing. I would like to understand better:
Thanks a lot!
Best,
Cesc