mapequation / infomap

Multi-level network clustering based on the Map Equation
https://mapequation.org/infomap
GNU General Public License v3.0
425 stars 88 forks source link

Multi-dimensional metadata support? #316

Closed xiangyh9988 closed 2 years ago

xiangyh9988 commented 2 years ago

Thanks for your excellent work!

Recently, after reading paper A Map Equation with Metadata: Varying the Role of Attributes in Community Detection, I try to use the set_meta_data python interface to add metadata to some nodes in the graph and it works greatly. Then I try to set multi-dim vector as the metadata for each node, as another paper Partitioning Networks with Node Attributes by Compressing Information Flow did.

I firstly modify the input args of Network::addMetaData function in Network.cpp to receive metadata with the data type of vector<float> (the original type is vector<int>). However, when I step further to read the C++ codes, I found that only the first element of vector is added into the metaCollection, e.g. metaCollection.add(node.metaData[0], weightByFlow ? node.data.flow : m_unweightedNodeFlow); in MetaMapEquation.cpp.

So, infomap cannot support the multi-dim metadata for now? I'm not sure, so open this issue asking for help. Thanks in advance!

xiangyh9988 commented 2 years ago

I try to modify the codes to support the multi-dim metadata, while I'm not sure whether it is right. And, the clustering performance in my project is not improved, which might mean I misunderstand the use of multi-dim metadata.

The steps are as follows:

  1. The input arguments of metaCollection.add in MetaMapEquation.cpp. (There are several calls of metaCollection.add and I just post one example here.)
    // The original code
    metaCollection.add(node.metaData[0], weightByFlow ? node.data.flow : m_unweightedNodeFlow);
    // The modified code
    metaCollection.add(node.metaData, weightByFlow ? node.data.flow : m_unweightedNodeFlow);
  2. Functions in MetaCollection.h. Mainly convert the original unsigned int meta to std::vector<float> metaData. In my trial, the key of m_metaToFlowCount is the dimensional of metadata rather than the value of meta (i.e. the original setting). For example, if we use the 3-dim vector as the metadata, the m_metaToFlowCount looks {0: 0.7, 1: 0.1, 2: 0.2}. The modified codes are:
     // metaData[i] * flow: follow the weighted sum of CME in paper Partitioning Networks with Node Attributes by Compressing Information Flow
    void add(std::vector<float> metaData, double flow = 1.0)
    {
        m_total += flow;
        for (unsigned int i = 0; i < metaData.size(); ++i) 
        {
            m_metaToFlowCount[i] += metaData[i] * flow;
        }
     }
    void add(std::vector<float> metaData, const FlowCount& flow)
    {
      m_total += flow;
      for (unsigned int i = 0; i < metaData.size(); ++i)
      {
          m_metaToFlowCount[i] += metaData[i] * flow.flow;
      }
    }
    void add(unsigned int metaDim, const FlowCount& flow)
    {
        m_metaToFlowCount[metaDim] += flow;
    }
    void add(const MetaCollection& other)
    {
      m_total += other.m_total;
      for (auto& it : other) {
        auto metaDim = it.first;
        auto& flowCount = it.second;
        add(metaDim, flowCount);
      }
    }
    // the remove function are modified similarly and are omitted to save space here
    // the function to calculate entropy is also similar to the original one
    double calculateEntropy()
    {
        double metaCodelength = 0.0;
        for (auto& it : m_metaToFlowCount)
        {
            if (it.second.flow > 0) {
                metaCodelength -= infomath::plogp(it.second.flow/m_total.flow);
         }
     }
     return m_total.flow * metaCodelength;
    }

    Honestly, it seems that there is no obvious difference between the calculation of original 1-dim metadata and CME, which confuses me.

  3. Feed the multi-dim metadata with the type of vector<float> to set_meta_data and run infomap.

The above is just a simple trial and I'm still a little confused about how to use multi-dim metadata. Hope for your help and thanks in advance!

danieledler commented 2 years ago

Hi @xiangyh9988 and thanks for your testing! You are right that the support for multi-dimensional metadata was only slightly prepared but not yet implemented. Unfortunately I can't check it now but will check it in a couple of weeks if the issue is still open.

xiangyh9988 commented 2 years ago

Hi @xiangyh9988 and thanks for your testing! You are right that the support for multi-dimensional metadata was only slightly prepared but not yet implemented. Unfortunately I can't check it now but will check it in a couple of weeks if the issue is still open.

Ok, I see. It doesn't matter and it's just a simple attempt.

For now, the multi-dim metadata is not important for me, because the design of metadata vector in my project is not reasonable. That's to say, in a short time, I will not try to use multi-dim metadata.

So I will close this issue and hope you will release a official version supporting multi-dim metadata.

xiangyh9988 commented 2 years ago

Thanks for your reply! Hope for the new version!