mapequation / infomap

Multi-level network clustering based on the Map Equation
https://mapequation.org/infomap
GNU General Public License v3.0
426 stars 88 forks source link

Time complexity of v1/grassberger-jelena #58

Closed cerebis closed 4 years ago

cerebis commented 4 years ago

Describe the bug I am curious to know if the time complexity of the v1/grassberger branches is to blame for the extremely long projected run times for larger graphs (>100k nodes, millions of edges).

Using the master branch the graph in question is successfully clustered in around 30 minutes using 10 trials. However, for v1/grassberger-jelena, I aborted the clustering run after accruing 8 hours without the first trial being completed.

This was for a undirected, weighted graph defined in link-list format. The graph is highly modular, with perhaps 300 communities which become increasing sparse as subject become less frequently observed. (microbial community data)

I did not employ inner loop parallelization in either case.

To Reproduce

./Infomap edges.in -v -u -z -i link-list out_dir

Desktop (please complete the following information):

mrosvall commented 4 years ago

You have found a branch with experimental research code so far only developed to calculate the code length with Grassberger entropy estimation for small networks given a network partition using flags -c\<p> --cluster-data \<p> --no-infomap Paper soon available. The code fails for you because the Grassberger entropy calculation uses a too small lookup table for your network. What is your objective?

cerebis commented 4 years ago

Infomap is a key element in a tool I have created to group together fragmentary DNA data, where links between fragments are determined via a modified form of DNA sequencing protocol called Hi-C. The DNA fragments represent a sampling of an entire microbial community (metagenomics).

As such, I keep an eye on Infomap development. I have previously experimented with the v1-beta branch and noticed that it produced finer clusters than the master branch. However some of the splitting may not necessary be better for our purposes. I haven't had the time to explore what is happening at sufficient depth.

Just as with the v1-beta branch, I just recently had a pragmatic look at the Grassberger branches (which I realised with a little reading must use this entropy calculation).

I assess results using a ground truth validation process against simulated experimental data.

Finally, my objective is to improve precision and recall (or some analog).

cerebis commented 4 years ago

One detail I failed to mention is that the nature of our data should be better modelled using soft-clusters.

In testing, I have not found a soft-clustering algorithm which perform well enough to supplant high-performing hard-clustering methods.

cerebis commented 4 years ago

oops! I didn't mean to close the issue

jelenasm commented 4 years ago

There was confusion about the names of the branches.

In the branch v1/grassberger, Grassberger entropy estimator has been incorporated in the map equation. To run it flags --grassberger --two-level should be used.

In the branch v1/grassberger-jelena, the map equation uses Bayesan entropy estimator. This branch has now been renamed v1/bayes. To run Infomap with Bayesian estimator flags --bayes --two-level should be used.

cerebis commented 4 years ago

Thanks for the clarification on the runtime options.

On Mon, 14 Oct 2019 at 22:51, jelenasm notifications@github.com wrote:

There was confusion about the names of the branches.

In the branch v1/grassberger, Grassberger entropy estimator has been incorporated in the map equation. To run it flags --grassberger --two-level should be used.

In the branch v1/grassberger-jelena, the map equation uses Bayesan entropy estimator. This branch has now been renamed v1/bayes. To run Infomap with Bayesian estimator flags --bayes --two-level should be used.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/mapequation/infomap/issues/58?email_source=notifications&email_token=ABN2PC2MJHCUKZ55BOJ52MTQORMNZA5CNFSM4I57EESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBEKI6I#issuecomment-541631609, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN2PC7R7R2JSPKWHCVBSTDQORMNZANCNFSM4I57EESA .