piratepeel / MultiscaleMixing

MIT License
15 stars 7 forks source link

Fix adjacency calculation with missings + example data #1

Closed trifle closed 5 years ago

trifle commented 5 years ago

Dear Mr. Peel,

thanks for publishing the code to your PNAS paper, both of which I read with great interest. I've noticed that there are two minor issues with the included example analysis:

I've taken the liberty to fix both these issues. If you could take the time to review and perhaps accept the PR, this might help future readers.

Thanks! Pascal Jürgens (U of Mainz, Germany)

trifle commented 5 years ago

(BTW, github seems to base PRs off of my master, rather than a commit. I had to undo some formatting changes, sorry if you were confused/spammed by that).

piratepeel commented 5 years ago

Thanks, I'll check it out this week...

On Mon, 22 Oct 2018 at 14:38, Pascal Jürgens notifications@github.com wrote:

(BTW, github seems to base PRs off of my master, rather than a commit. I had to undo some formatting changes, sorry if you were confused/spammed by that).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/piratepeel/MultiscaleMixing/pull/1#issuecomment-431823033, or mute the thread https://github.com/notifications/unsubscribe-auth/ARUfnX_ecP-6EcfOsT4G4F7l3C3DdWxBks5unbw0gaJpZM4XzNmA .

trifle commented 5 years ago

Great! Although it's definitely not time critical for me ... and others might look for this PR in case they run into reproducibility issues.

piratepeel commented 5 years ago

Hi Pascal,

I took a quick look at these issues...

  1. You are right, I seem to have uploaded the wrong format for the labels file, thanks for noticing this!
  2. I'm not sure I understand the missing values problem. What type of missing values are you trying to deal with? Looking at the code it seems that you are trying to deal with missing node ids in the edges or incorporating missing edges. Either way it seems a little strange to me to include an edge that is incident on an unknown node (or pair of unknown nodes). Could you please elaborate on what you are trying to address here?

Cheers, Leto

trifle commented 5 years ago

Hi Leto,

thanks for looking at it! You're right, I was pretty brief here.

The missing node issue came up when I tried using the included facebook example data. If you run example code with that data, you'll get an exception on the dot product because of a dimension mismatch:

ValueError                                Traceback (most recent call last)
<ipython-input-4-fa4ffec92469> in <module>
----> 1 assortM, assortT, Z = localassort.localAssortF(E,M,pr=np.arange(0,1,0.1))

~/Projects/MultiscaleMixing/localassort.py in localAssortF(E, M, pr, undir, missingValue)
     21     Z = np.zeros(n)
     22     Z[M == missingValue] = 1.
---> 23     Z = W.dot(Z) / degree
     24 
     25     values = np.ones(ncomp)

~/.local/share/virtualenvs/MultiscaleMixing-_J4UNQZc/lib/python3.7/site-packages/scipy/sparse/base.py in dot(self, other)
    359 
    360         """
--> 361         return self * other
    362 
    363     def power(self, n, dtype=None):

~/.local/share/virtualenvs/MultiscaleMixing-_J4UNQZc/lib/python3.7/site-packages/scipy/sparse/base.py in __mul__(self, other)
    495             # dense row or column vector
    496             if other.shape != (N,) and other.shape != (N, 1):
--> 497                 raise ValueError('dimension mismatch')
    498 
    499             result = self._mul_vector(np.ravel(other))

ValueError: dimension mismatch

Digging through the code, it seems like the problem is not invalid nodes, but rather the network loading code. The karate file is 1-indexed, while the FB example is zero-indexed. However, the example notebook loads it via E, M = loadNetwork.load(networkfile, metadatafile, 1). I didn't recognize that the final parameter needed to be switched.

So what happens is that the zero-indexed network gets loaded, 1 is subtracted and yields a first node with the ID -1. I was confused for a bit since that is the value used for missing values in localAssortF.

So while my filter does not really do any harm, it doesn't make any sense and should not be merged. Sorry for the noise :)

piratepeel commented 5 years ago

ok, to make things a bit clearer, I've added "zero_index" as a named argument of the load function in the notebook. And I removed the facebook data for now.

Thanks for bringing these issues to my attention :)