pwollstadt / IDTxl

The Information Dynamics Toolkit xl (IDTxl) is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory.
http://pwollstadt.github.io/IDTxl/
GNU General Public License v3.0
249 stars 76 forks source link

Fast PID estimator #1

Closed finnconor closed 8 years ago

finnconor commented 8 years ago

Adding the fast PID estimator. Primarily the file

   fast_pid/estimators_fast_pid.py

which implements the fast PID algorithm but also includes some test scripts.

mwibral commented 8 years ago

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

Pick a swap candidate

Michael

mwibral commented 8 years ago

Hi pwollsta, I vote for accepting the code now and then rework it to conform with the uni test framework (relative imports, folders, assert statements,)

pwollstadt commented 8 years ago

This looks very nice indeed :) And this whole pull-request thing works great too. @finnconor I will merge the estimator into the main branch and integrate the changes @mwibral suggested.

Thanks a lot!

finnconor commented 8 years ago

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):
  • s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201

finnconor commented 8 years ago

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):
  • s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201

mwibral commented 8 years ago

Hi Conor,

got it. Thanks for explaining :-)

Michael

On 25.04.2016 18:01, finnconor wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214417649

mwibral commented 8 years ago

Hi Conor,

two questions: (1) Is the typo below still current in the code, I can't spot it - or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448Web Bug from https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

finnconor commented 8 years ago

Hi Michael,

Just a quick (sleepily) reply: (1) That typo was never in the original code (2) No I never encountered that outside of troublesome development code.

That does not sound good, so I will have to take a look at that tomorrow / as soon as I can.

If you want to look at something right away then check the following:

(1) Is the s1 unique information okay? If the s1 unique information is okay then the error is probably lurking in the classical mutual information calculators (which is possible as the were the last additions to finally remove JIDT completely).

(2) If the s1 unique information is not fine then I would suggest uncommenting a print line at the end of the inner loop. That will print the unique s1 information as it converges as the virtualised replication occurs / the probability increment decreases in size / the outer loop iterates). When I was developing this I encountered an integer overflow which caused this to converge then suddenly flip the sign which introduced negative numbers all over the place. It is possible I didn't catch all possible eventualities there.

I didn't get the other email yet. I will look at this properly tomorrow.

Conor

Sent from my mobile device. On 5 May 2016 2:03 am, "mwibral" notifications@github.com wrote:

Hi Conor,

two questions: (1) Is the typo below still current in the code, I can't spot it - or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448Web Bug from

https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216912993

mwibral commented 8 years ago

Hi all,

I experimented some more with Conors code and thnik the weird numbers I see are a convergence problem, brought about by making the step-size to small too fast (see the four test cases and the solution below).

Best, Michael

(A)-------------------------------------------

alph_x = 2 alph_y = 2 alph_z = 2

n = 5000

x = np.random.randint(0, alph_x, n) y = np.random.randint(0, alph_y, n) z = np.logical_and(x, y).astype(int)

cfg = { 'alph_s1': alph_x, 'alph_s2': alph_y, 'alph_t': alph_z, 'max_unsuc_swaps_row_parm': 3, 'num_reps': 63, 'max_iters': 10000 }

==>

PID evaluation 0.737 seconds

Uni s1 [ 0.0084285253] Uni s2 [ 4.2283885e-18] Shared s1_s2 [ 0.30268407] Synergy s1_s2 [ 0.49019005]

(B)----------------------------------------- "just increasing max_iters leads to a worse result (puzzling, this shouldn't happen at all, should it)"

n = 5000 'max_iters': 100000

==>

PID evaluation 19.009 seconds

Uni s1 [ 0.092863864] Uni s2 [ 0.088796402] Shared s1_s2 [ 0.22519745] Synergy s1_s2 [ 0.40851536]

(C)----------------------------------------- "increasing data count and max_iters leads to a pretty wrong result"

n = 50000 'max_iters': 100000

==>

PID evaluation 15.338 seconds

Uni s1 [ 0.30261264] Uni s2 [ 0.30144382] Shared s1_s2 [ 0.01085617] Synergy s1_s2 [ 0.20024127]

(D)---------------------------------------------- " just increasing the data count (compared to case (A)) yields negative values"

n = 50000 'max_iters': 10000

==>

PID evaluation 15.186 seconds

Uni s1 [ 0.44966498] Uni s2 [ 0.45117518] Shared s1_s2 [-0.14088139] Synergy s1_s2 [ 0.049411856]

(E)------------------------------------------------------

E.1 increasing the 'unsuc' swaps to 300 cures cases (C) - which seems funny to me E.2 increasing the 'unsuc' swaps to 30.000 and 'max_iters' to 100.000 cures case (D)

Conclusion:

  1. Obviously for large(r) datasets, swaps start already with pretty small increments, if one switches to even smaller increments too early, this is dangerous, as we will be swapping succesfully but move ever slower in the convex landscape - we basically 'starve' on the way to our goal. Something similar will happen for large alphabets.
    • Maybe we should make the unsuccesful swaps parameter relative to both the sizes of the alphabets of the random variables that go in to the function, and to the sample count? And check taht max_iters is always bigger than 'unsuc_swaps'.

On 04.05.2016 18:42, finnconor wrote:

Hi Michael,

Just a quick (sleepily) reply: (1) That typo was never in the original code (2) No I never encountered that outside of troublesome development code.

That does not sound good, so I will have to take a look at that tomorrow / as soon as I can.

If you want to look at something right away then check the following:

(1) Is the s1 unique information okay? If the s1 unique information is okay then the error is probably lurking in the classical mutual information calculators (which is possible as the were the last additions to finally remove JIDT completely).

(2) If the s1 unique information is not fine then I would suggest uncommenting a print line at the end of the inner loop. That will print the unique s1 information as it converges as the virtualised replication occurs / the probability increment decreases in size / the outer loop iterates). When I was developing this I encountered an integer overflow which caused this to converge then suddenly flip the sign which introduced negative numbers all over the place. It is possible I didn't catch all possible eventualities there.

I didn't get the other email yet. I will look at this properly tomorrow.

Conor

Sent from my mobile device. On 5 May 2016 2:03 am, "mwibral" notifications@github.com wrote:

Hi Conor,

two questions: (1) Is the typo below still current in the code, I can't spot it - or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448Web Bug from

https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216912993

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216924313

finnconor commented 8 years ago

Hi Michael,

Okay, this is very good. I was actually in the process of writing an email to you coming to basically the exact same conclusion although from a slightly different angle. You beat me to sending it first though so I had to re-write my message :)

The negative values for the redundant and synergistic information were introduced by bad cfg settings (the unique s1 information was never getting a chance to converge at all with max_iters = 100). Changing them to my settings does lead to convergence in your fast_PID_analysis.py script but it is both incredibly slow and, in fact as you said dangerous / incorrect. This is because the probability increment starts off way smaller than it should (the slowness) and as you quite correctly point out, the method switches to even smaller probabilities before the virtualised swapping has finished (or as you said, the dangerous starving on the way down).

So we both agree on the issue here I believe, which is quite good. So where do we now stand.

The thing is we did know that the convergence criteria was not exactly implemented well. Recall the email from when I first sent this method to you, in which I said the following:

'max_iters' provides a hard upper bound on the number of times it will

attempt to perform virtualised swaps in the inner loop (which is like your original code before you added any convergence criteria). However, this hard limit is (practically) never used as it should always hit the soft limit defined by the next parameter before hitting this hard limit. So this is a bit of a legacy parameter in that regard and I imagine we will be removing it in the near future.

At the time I was tempted to say that it you never want it to hit the hard limit as I knew this would lead to the "dangerous starving" problem, but I left it out to try and avoid complicating the email. I then said:

The way the convergence is handled above isn't exactly elegant (although it

does seem to do the job from the looks of things) so I imagine we will be looking to improve this going forward.

So as you can see, I did know this needed to be improved and was potentially dangerous -- I just thought you might be able to use the code before I actually tidied up the convergence. It is apparent now that this is not the case!

So, where do we go from here. I will tidy up the convergence criteria and make it something far more sensible. I should hopefully get around to this next week.

Does this make sense?

Conor

On 5 May 2016 at 20:17, mwibral notifications@github.com wrote:

Hi all,

I experimented some more with Conors code and thnik the weird numbers I see are a convergence problem, brought about by making the step-size to small too fast (see the four test cases and the solution below).

Best, Michael

(A)-------------------------------------------

alph_x = 2 alph_y = 2 alph_z = 2

n = 5000

x = np.random.randint(0, alph_x, n) y = np.random.randint(0, alph_y, n) z = np.logical_and(x, y).astype(int)

cfg = { 'alph_s1': alph_x, 'alph_s2': alph_y, 'alph_t': alph_z, 'max_unsuc_swaps_row_parm': 3, 'num_reps': 63, 'max_iters': 10000 }

==>

PID evaluation 0.737 seconds

Uni s1 [ 0.0084285253] Uni s2 [ 4.2283885e-18] Shared s1_s2 [ 0.30268407] Synergy s1_s2 [ 0.49019005]

(B)----------------------------------------- "just increasing max_iters leads to a worse result (puzzling, this shouldn't happen at all, should it)"

n = 5000 'max_iters': 100000

==>

PID evaluation 19.009 seconds

Uni s1 [ 0.092863864] Uni s2 [ 0.088796402] Shared s1_s2 [ 0.22519745] Synergy s1_s2 [ 0.40851536]

(C)----------------------------------------- "increasing data count and max_iters leads to a pretty wrong result"

n = 50000 'max_iters': 100000

==>

PID evaluation 15.338 seconds

Uni s1 [ 0.30261264] Uni s2 [ 0.30144382] Shared s1_s2 [ 0.01085617] Synergy s1_s2 [ 0.20024127]

(D)---------------------------------------------- " just increasing the data count (compared to case (A)) yields negative values"

n = 50000 'max_iters': 10000

==>

PID evaluation 15.186 seconds

Uni s1 [ 0.44966498] Uni s2 [ 0.45117518] Shared s1_s2 [-0.14088139] Synergy s1_s2 [ 0.049411856]

(E)------------------------------------------------------

E.1 increasing the 'unsuc' swaps to 300 cures cases (C) - which seems funny to me E.2 increasing the 'unsuc' swaps to 30.000 and 'max_iters' to 100.000 cures case (D)

Conclusion:

  1. Obviously for large(r) datasets, swaps start already with pretty small increments, if one switches to even smaller increments too early, this is dangerous, as we will be swapping succesfully but move ever slower in the convex landscape - we basically 'starve' on the way to our goal. Something similar will happen for large alphabets.
    • Maybe we should make the unsuccesful swaps parameter relative to both the sizes of the alphabets of the random variables that go in to the function, and to the sample count? And check taht max_iters is always bigger than 'unsuc_swaps'.

On 04.05.2016 18:42, finnconor wrote:

Hi Michael,

Just a quick (sleepily) reply: (1) That typo was never in the original code (2) No I never encountered that outside of troublesome development code.

That does not sound good, so I will have to take a look at that tomorrow / as soon as I can.

If you want to look at something right away then check the following:

(1) Is the s1 unique information okay? If the s1 unique information is okay then the error is probably lurking in the classical mutual information calculators (which is possible as the were the last additions to finally remove JIDT completely).

(2) If the s1 unique information is not fine then I would suggest uncommenting a print line at the end of the inner loop. That will print the unique s1 information as it converges as the virtualised replication occurs / the probability increment decreases in size / the outer loop iterates). When I was developing this I encountered an integer overflow which caused this to converge then suddenly flip the sign which introduced negative numbers all over the place. It is possible I didn't catch all possible eventualities there.

I didn't get the other email yet. I will look at this properly tomorrow.

Conor

Sent from my mobile device. On 5 May 2016 2:03 am, "mwibral" notifications@github.com wrote:

Hi Conor,

two questions: (1) Is the typo below still current in the code, I can't spot it - or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201>

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub <https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448 Web Bug from

https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216912993

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216924313

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217118679

mwibral commented 8 years ago

Hi all,

a couple of more thoughts: (1) the initial swaps could perhaps be bigger than 1/n, rather something like 1/2*n_min/n where n is the total number of samples and n_min the smallest count in any bin of the joint distribution. This would helps us to make larger strides towards the bottom initially. (2) maybe the reduction in prob_inc, the probabiltiy mass being swapped could be driven by a smarter criterion that measures something like the decrement in the cmi of the last succesful swap, or something similar, at least?

Michael

On 04.05.2016 18:42, finnconor wrote:

Hi Michael,

Just a quick (sleepily) reply: (1) That typo was never in the original code (2) No I never encountered that outside of troublesome development code.

That does not sound good, so I will have to take a look at that tomorrow / as soon as I can.

If you want to look at something right away then check the following:

(1) Is the s1 unique information okay? If the s1 unique information is okay then the error is probably lurking in the classical mutual information calculators (which is possible as the were the last additions to finally remove JIDT completely).

(2) If the s1 unique information is not fine then I would suggest uncommenting a print line at the end of the inner loop. That will print the unique s1 information as it converges as the virtualised replication occurs / the probability increment decreases in size / the outer loop iterates). When I was developing this I encountered an integer overflow which caused this to converge then suddenly flip the sign which introduced negative numbers all over the place. It is possible I didn't catch all possible eventualities there.

I didn't get the other email yet. I will look at this properly tomorrow.

Conor

Sent from my mobile device. On 5 May 2016 2:03 am, "mwibral" notifications@github.com wrote:

Hi Conor,

two questions: (1) Is the typo below still current in the code, I can't spot it - or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448Web Bug from

https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216912993

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216924313

finnconor commented 8 years ago

PS if you are reading this on github you might want to read it in you email as github is cutting sections off for whatever reason.

On 5 May 2016 at 21:03, Conor Finn finnconor@gmail.com wrote:

Hi Michael,

Okay, this is very good. I was actually in the process of writing an email to you coming to basically the exact same conclusion although from a slightly different angle. You beat me to sending it first though so I had to re-write my message :)

The negative values for the redundant and synergistic information were introduced by bad cfg settings (the unique s1 information was never getting a chance to converge at all with max_iters = 100). Changing them to my settings does lead to convergence in your fast_PID_analysis.py script but it is both incredibly slow and, in fact as you said dangerous / incorrect. This is because the probability increment starts off way smaller than it should (the slowness) and as you quite correctly point out, the method switches to even smaller probabilities before the virtualised swapping has finished (or as you said, the dangerous starving on the way down).

So we both agree on the issue here I believe, which is quite good. So where do we now stand.

The thing is we did know that the convergence criteria was not exactly implemented well. Recall the email from when I first sent this method to you, in which I said the following:

'max_iters' provides a hard upper bound on the number of times it will

attempt to perform virtualised swaps in the inner loop (which is like your original code before you added any convergence criteria). However, this hard limit is (practically) never used as it should always hit the soft limit defined by the next parameter before hitting this hard limit. So this is a bit of a legacy parameter in that regard and I imagine we will be removing it in the near future.

At the time I was tempted to say that it you never want it to hit the hard limit as I knew this would lead to the "dangerous starving" problem, but I left it out to try and avoid complicating the email. I then said:

The way the convergence is handled above isn't exactly elegant (although

it does seem to do the job from the looks of things) so I imagine we will be looking to improve this going forward.

So as you can see, I did know this needed to be improved and was potentially dangerous -- I just thought you might be able to use the code before I actually tidied up the convergence. It is apparent now that this is not the case!

So, where do we go from here. I will tidy up the convergence criteria and make it something far more sensible. I should hopefully get around to this next week.

Does this make sense?

Conor

On 5 May 2016 at 20:17, mwibral notifications@github.com wrote:

Hi all,

I experimented some more with Conors code and thnik the weird numbers I see are a convergence problem, brought about by making the step-size to small too fast (see the four test cases and the solution below).

Best, Michael

(A)-------------------------------------------

alph_x = 2 alph_y = 2 alph_z = 2

n = 5000

x = np.random.randint(0, alph_x, n) y = np.random.randint(0, alph_y, n) z = np.logical_and(x, y).astype(int)

cfg = { 'alph_s1': alph_x, 'alph_s2': alph_y, 'alph_t': alph_z, 'max_unsuc_swaps_row_parm': 3, 'num_reps': 63, 'max_iters': 10000 }

==>

PID evaluation 0.737 seconds

Uni s1 [ 0.0084285253] Uni s2 [ 4.2283885e-18] Shared s1_s2 [ 0.30268407] Synergy s1_s2 [ 0.49019005]

(B)----------------------------------------- "just increasing max_iters leads to a worse result (puzzling, this shouldn't happen at all, should it)"

n = 5000 'max_iters': 100000

==>

PID evaluation 19.009 seconds

Uni s1 [ 0.092863864] Uni s2 [ 0.088796402] Shared s1_s2 [ 0.22519745] Synergy s1_s2 [ 0.40851536]

(C)----------------------------------------- "increasing data count and max_iters leads to a pretty wrong result"

n = 50000 'max_iters': 100000

==>

PID evaluation 15.338 seconds

Uni s1 [ 0.30261264] Uni s2 [ 0.30144382] Shared s1_s2 [ 0.01085617] Synergy s1_s2 [ 0.20024127]

(D)---------------------------------------------- " just increasing the data count (compared to case (A)) yields negative values"

n = 50000 'max_iters': 10000

==>

PID evaluation 15.186 seconds

Uni s1 [ 0.44966498] Uni s2 [ 0.45117518] Shared s1_s2 [-0.14088139] Synergy s1_s2 [ 0.049411856]

(E)------------------------------------------------------

E.1 increasing the 'unsuc' swaps to 300 cures cases (C) - which seems funny to me E.2 increasing the 'unsuc' swaps to 30.000 and 'max_iters' to 100.000 cures case (D)

Conclusion:

  1. Obviously for large(r) datasets, swaps start already with pretty small increments, if one switches to even smaller increments too early, this is dangerous, as we will be swapping succesfully but move ever slower in the convex landscape - we basically 'starve' on the way to our goal. Something similar will happen for large alphabets.
    • Maybe we should make the unsuccesful swaps parameter relative to both the sizes of the alphabets of the random variables that go in to the function, and to the sample count? And check taht max_iters is always bigger than 'unsuc_swaps'.

On 04.05.2016 18:42, finnconor wrote:

Hi Michael,

Just a quick (sleepily) reply: (1) That typo was never in the original code (2) No I never encountered that outside of troublesome development code.

That does not sound good, so I will have to take a look at that tomorrow / as soon as I can.

If you want to look at something right away then check the following:

(1) Is the s1 unique information okay? If the s1 unique information is okay then the error is probably lurking in the classical mutual information calculators (which is possible as the were the last additions to finally remove JIDT completely).

(2) If the s1 unique information is not fine then I would suggest uncommenting a print line at the end of the inner loop. That will print the unique s1 information as it converges as the virtualised replication occurs / the probability increment decreases in size / the outer loop iterates). When I was developing this I encountered an integer overflow which caused this to converge then suddenly flip the sign which introduced negative numbers all over the place. It is possible I didn't catch all possible eventualities there.

I didn't get the other email yet. I will look at this properly tomorrow.

Conor

Sent from my mobile device. On 5 May 2016 2:03 am, "mwibral" notifications@github.com wrote:

Hi Conor,

two questions: (1) Is the typo below still current in the code, I can't spot it - or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201>

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub <https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448 Web Bug from

https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216912993

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216924313

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217118679

finnconor commented 8 years ago

Hi all,

Yes we are on the same page here. Regarding:

(1) yes definitely... we should decided on some kind of a larger smarter starting probability increment.

(2) This is something I have already discusses with Joe. Obviously just halving the probability mass isn't exactly intelligent. I think we should be able to implement some Newton like method for adjusting the probability increment as you said, based on the change of the CMI between each probability mass increment size reduction.

I will first look at implementing some kind of sensible convergence criteria and then I will look at the actual convergence method.

On 5 May 2016 at 21:13, mwibral notifications@github.com wrote:

Hi all,

a couple of more thoughts: (1) the initial swaps could perhaps be bigger than 1/n, rather something like 1/2*n_min/n where n is the total number of samples and n_min the smallest count in any bin of the joint distribution. This would helps us to make larger strides towards the bottom initially. (2) maybe the reduction in prob_inc, the probabiltiy mass being swapped could be driven by a smarter criterion that measures something like the decrement in the cmi of the last succesful swap, or something similar, at least?

Michael

On 04.05.2016 18:42, finnconor wrote:

Hi Michael,

Just a quick (sleepily) reply: (1) That typo was never in the original code (2) No I never encountered that outside of troublesome development code.

That does not sound good, so I will have to take a look at that tomorrow / as soon as I can.

If you want to look at something right away then check the following:

(1) Is the s1 unique information okay? If the s1 unique information is okay then the error is probably lurking in the classical mutual information calculators (which is possible as the were the last additions to finally remove JIDT completely).

(2) If the s1 unique information is not fine then I would suggest uncommenting a print line at the end of the inner loop. That will print the unique s1 information as it converges as the virtualised replication occurs / the probability increment decreases in size / the outer loop iterates). When I was developing this I encountered an integer overflow which caused this to converge then suddenly flip the sign which introduced negative numbers all over the place. It is possible I didn't catch all possible eventualities there.

I didn't get the other email yet. I will look at this properly tomorrow.

Conor

Sent from my mobile device. On 5 May 2016 2:03 am, "mwibral" notifications@github.com wrote:

Hi Conor,

two questions: (1) Is the typo below still current in the code, I can't spot it - or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201>

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub <https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448 Web Bug from

https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216912993

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216924313

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217126918

jlizier commented 8 years ago

Hi all,

Just on those thoughts:

  1. I really like this suggestion for computing the initial virtual swap size. Clearly the issue is that using a large number of samples N led us to straight away use 1/N which was very small, and 1/2*n_min/N is a nice suggestion to scale this automatically.
  2. This does need to be checked, but my gut feeling is (still) that this isn't the main issue. I may be wrong, but I think addressing point 1 above, in conjunction with setting max_iters etc appropriately (can we maybe automate these?) will bring the convergence back to a good time. (It could still be done faster perhaps, but I'm not convinced it will be an order of magnitude). I'm happy to be proven wrong though :)

And a new one:

  1. We still have the issue of why the values went negative. I see that we don't get them if we put proper values in for max_iters etc, but at face value that should never happen even if the conditional MI has not yet converged. The only way that shared info could be negative is where the conditional MI (and therefore unique info) went larger than the unconditioned MI. But since the original conditional MI must have been less than or equal to this, and we only make changes that reduce the conditional MI, this shouldn't happen at any stage before convergence. So either we need to check that new all-in-python MI calculator or we had some overflow error which you previously suggested Conor -- can you look a bit further into this?

--joe

On 5 May 2016 at 21:21, finnconor notifications@github.com wrote:

Hi all,

Yes we are on the same page here. Regarding:

(1) yes definitely... we should decided on some kind of a larger smarter starting probability increment.

(2) This is something I have already discusses with Joe. Obviously just halving the probability mass isn't exactly intelligent. I think we should be able to implement some Newton like method for adjusting the probability increment as you said, based on the change of the CMI between each probability mass increment size reduction.

I will first look at implementing some kind of sensible convergence criteria and then I will look at the actual convergence method.

On 5 May 2016 at 21:13, mwibral notifications@github.com wrote:

Hi all,

a couple of more thoughts: (1) the initial swaps could perhaps be bigger than 1/n, rather something like 1/2*n_min/n where n is the total number of samples and n_min the smallest count in any bin of the joint distribution. This would helps us to make larger strides towards the bottom initially. (2) maybe the reduction in prob_inc, the probabiltiy mass being swapped could be driven by a smarter criterion that measures something like the decrement in the cmi of the last succesful swap, or something similar, at least?

Michael

On 04.05.2016 18:42, finnconor wrote:

Hi Michael,

Just a quick (sleepily) reply: (1) That typo was never in the original code (2) No I never encountered that outside of troublesome development code.

That does not sound good, so I will have to take a look at that tomorrow / as soon as I can.

If you want to look at something right away then check the following:

(1) Is the s1 unique information okay? If the s1 unique information is okay then the error is probably lurking in the classical mutual information calculators (which is possible as the were the last additions to finally remove JIDT completely).

(2) If the s1 unique information is not fine then I would suggest uncommenting a print line at the end of the inner loop. That will print the unique s1 information as it converges as the virtualised replication occurs / the probability increment decreases in size / the outer loop iterates). When I was developing this I encountered an integer overflow which caused this to converge then suddenly flip the sign which introduced negative numbers all over the place. It is possible I didn't catch all possible eventualities there.

I didn't get the other email yet. I will look at this properly tomorrow.

Conor

Sent from my mobile device. On 5 May 2016 2:03 am, "mwibral" notifications@github.com wrote:

Hi Conor,

two questions: (1) Is the typo below still current in the code, I can't spot it - or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201>

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub <https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448 Web Bug from

https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216912993

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216924313

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217126918

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217128114

finnconor commented 8 years ago

(1) Yes, I agree with Joe, it is a nice suggestion Michael (sorry I didn't say that before).

(2) I think there was a bit of confusion here. I am distinguishing between two things here: the convergence criteria; and the convergence method. I don't think we were (or at least I wasn't) suggesting that the current convergence method (i.e. repeated halving) is causing the slowness here. It is the convergence criteria that is the problem at the moment (i.e. deciding when to decrease the probability mass increment). I agree with you that addressing point (1) and setting the variable max_iters appropriately (intelligently) should restore the speed. That is why I added that I will first look at implementing some kind of sensible convergence criteria and only then look at the actual convergence method.

(3) Yes, I was perhaps being wishful attributing that problem to that possible cause. I see your point that conditional MI shouldn't ever go larger than the unconditioned MI. I will dig into this as soon as I get a chance.

Also, I think we should have a discussion about how the pairs for the probability mass swap are selected. Why are we doing this randomly rather than in some way sequentially?

On 5 May 2016 at 22:15, Joseph Lizier notifications@github.com wrote:

Hi all,

Just on those thoughts:

  1. I really like this suggestion for computing the initial virtual swap size. Clearly the issue is that using a large number of samples N led us to straight away use 1/N which was very small, and 1/2*n_min/N is a nice suggestion to scale this automatically.
  2. This does need to be checked, but my gut feeling is (still) that this isn't the main issue. I may be wrong, but I think addressing point 1 above, in conjunction with setting max_iters etc appropriately (can we maybe automate these?) will bring the convergence back to a good time. (It could still be done faster perhaps, but I'm not convinced it will be an order of magnitude). I'm happy to be proven wrong though :)

And a new one:

  1. We still have the issue of why the values went negative. I see that we don't get them if we put proper values in for max_iters etc, but at face value that should never happen even if the conditional MI has not yet converged. The only way that shared info could be negative is where the conditional MI (and therefore unique info) went larger than the unconditioned MI. But since the original conditional MI must have been less than or equal to this, and we only make changes that reduce the conditional MI, this shouldn't happen at any stage before convergence. So either we need to check that new all-in-python MI calculator or we had some overflow error which you previously suggested Conor -- can you look a bit further into this?

--joe

On 5 May 2016 at 21:21, finnconor notifications@github.com wrote:

Hi all,

Yes we are on the same page here. Regarding:

(1) yes definitely... we should decided on some kind of a larger smarter starting probability increment.

(2) This is something I have already discusses with Joe. Obviously just halving the probability mass isn't exactly intelligent. I think we should be able to implement some Newton like method for adjusting the probability increment as you said, based on the change of the CMI between each probability mass increment size reduction.

I will first look at implementing some kind of sensible convergence criteria and then I will look at the actual convergence method.

On 5 May 2016 at 21:13, mwibral notifications@github.com wrote:

Hi all,

a couple of more thoughts: (1) the initial swaps could perhaps be bigger than 1/n, rather something like 1/2*n_min/n where n is the total number of samples and n_min the smallest count in any bin of the joint distribution. This would helps us to make larger strides towards the bottom initially. (2) maybe the reduction in prob_inc, the probabiltiy mass being swapped could be driven by a smarter criterion that measures something like the decrement in the cmi of the last succesful swap, or something similar, at least?

Michael

On 04.05.2016 18:42, finnconor wrote:

Hi Michael,

Just a quick (sleepily) reply: (1) That typo was never in the original code (2) No I never encountered that outside of troublesome development code.

That does not sound good, so I will have to take a look at that tomorrow / as soon as I can.

If you want to look at something right away then check the following:

(1) Is the s1 unique information okay? If the s1 unique information is okay then the error is probably lurking in the classical mutual information calculators (which is possible as the were the last additions to finally remove JIDT completely).

(2) If the s1 unique information is not fine then I would suggest uncommenting a print line at the end of the inner loop. That will print the unique s1 information as it converges as the virtualised replication occurs / the probability increment decreases in size / the outer loop iterates). When I was developing this I encountered an integer overflow which caused this to converge then suddenly flip the sign which introduced negative numbers all over the place. It is possible I didn't catch all possible eventualities there.

I didn't get the other email yet. I will look at this properly tomorrow.

Conor

Sent from my mobile device. On 5 May 2016 2:03 am, "mwibral" notifications@github.com wrote:

Hi Conor,

two questions: (1) Is the typo below still current in the code, I can't spot it - or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" notifications@github.com wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201>

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub < https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448 Web Bug from

https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub <https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216912993

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216924313

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217126918

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217128114

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217138351

jlizier commented 8 years ago

Thanks Conor,

On your last point -- "how the pairs for the probability mass swap are selected. Why are we doing this randomly rather than in some way sequentially?" -- I think you and I did briefly discuss this when we calculated how many potential swaps there were as a function of the alphabet sizes? We considered that either trying those potentially swaps sequentially, or trying them randomly but tracking which ones we had done (so we know precisely when we've exhausted the swaps at a given increment), would be more efficient. But we decided to stick with the random swaps for the first cut at the code since it was a heap easier to write, and we could roughly be sure that all swaps had been tried by capping the number of unsuccessful attempts at the multiplier (3?) of the number of potential swaps.

--joe +61 408 186 901 (Au mobile)

On 5 May 2016 at 22:52, finnconor notifications@github.com wrote:

(1) Yes, I agree with Joe, it is a nice suggestion Michael (sorry I didn't say that before).

(2) I think there was a bit of confusion here. I am distinguishing between two things here: the convergence criteria; and the convergence method. I don't think we were (or at least I wasn't) suggesting that the current convergence method (i.e. repeated halving) is causing the slowness here. It is the convergence criteria that is the problem at the moment (i.e. deciding when to decrease the probability mass increment). I agree with you that addressing point (1) and setting the variable max_iters appropriately (intelligently) should restore the speed. That is why I added that I will first look at implementing some kind of sensible convergence criteria and only then look at the actual convergence method.

(3) Yes, I was perhaps being wishful attributing that problem to that possible cause. I see your point that conditional MI shouldn't ever go larger than the unconditioned MI. I will dig into this as soon as I get a chance.

Also, I think we should have a discussion about how the pairs for the probability mass swap are selected. Why are we doing this randomly rather than in some way sequentially?

On 5 May 2016 at 22:15, Joseph Lizier notifications@github.com wrote:

Hi all,

Just on those thoughts:

  1. I really like this suggestion for computing the initial virtual swap size. Clearly the issue is that using a large number of samples N led us to straight away use 1/N which was very small, and 1/2*n_min/N is a nice suggestion to scale this automatically.
  2. This does need to be checked, but my gut feeling is (still) that this isn't the main issue. I may be wrong, but I think addressing point 1 above, in conjunction with setting max_iters etc appropriately (can we maybe automate these?) will bring the convergence back to a good time. (It could still be done faster perhaps, but I'm not convinced it will be an order of magnitude). I'm happy to be proven wrong though :)

And a new one:

  1. We still have the issue of why the values went negative. I see that we don't get them if we put proper values in for max_iters etc, but at face value that should never happen even if the conditional MI has not yet converged. The only way that shared info could be negative is where the conditional MI (and therefore unique info) went larger than the unconditioned MI. But since the original conditional MI must have been less than or equal to this, and we only make changes that reduce the conditional MI, this shouldn't happen at any stage before convergence. So either we need to check that new all-in-python MI calculator or we had some overflow error which you previously suggested Conor -- can you look a bit further into this?

--joe

On 5 May 2016 at 21:21, finnconor notifications@github.com wrote:

Hi all,

Yes we are on the same page here. Regarding:

(1) yes definitely... we should decided on some kind of a larger smarter starting probability increment.

(2) This is something I have already discusses with Joe. Obviously just halving the probability mass isn't exactly intelligent. I think we should be able to implement some Newton like method for adjusting the probability increment as you said, based on the change of the CMI between each probability mass increment size reduction.

I will first look at implementing some kind of sensible convergence criteria and then I will look at the actual convergence method.

On 5 May 2016 at 21:13, mwibral notifications@github.com wrote:

Hi all,

a couple of more thoughts: (1) the initial swaps could perhaps be bigger than 1/n, rather something like 1/2*n_min/n where n is the total number of samples and n_min the smallest count in any bin of the joint distribution. This would helps us to make larger strides towards the bottom initially. (2) maybe the reduction in prob_inc, the probabiltiy mass being swapped could be driven by a smarter criterion that measures something like the decrement in the cmi of the last succesful swap, or something similar, at least?

Michael

On 04.05.2016 18:42, finnconor wrote:

Hi Michael,

Just a quick (sleepily) reply: (1) That typo was never in the original code (2) No I never encountered that outside of troublesome development code.

That does not sound good, so I will have to take a look at that tomorrow / as soon as I can.

If you want to look at something right away then check the following:

(1) Is the s1 unique information okay? If the s1 unique information is okay then the error is probably lurking in the classical mutual information calculators (which is possible as the were the last additions to finally remove JIDT completely).

(2) If the s1 unique information is not fine then I would suggest uncommenting a print line at the end of the inner loop. That will print the unique s1 information as it converges as the virtualised replication occurs / the probability increment decreases in size / the outer loop iterates). When I was developing this I encountered an integer overflow which caused this to converge then suddenly flip the sign which introduced negative numbers all over the place. It is possible I didn't catch all possible eventualities there.

I didn't get the other email yet. I will look at this properly tomorrow.

Conor

Sent from my mobile device. On 5 May 2016 2:03 am, "mwibral" notifications@github.com wrote:

Hi Conor,

two questions:

(1) Is the typo below still current in the code, I can't spot it

or was it never in the code? (2) compared to using my estimator I seem to get large, negative values for synergy and redundancy (analysis file and data mailed separately). Have you seen something similar?

Best, Michael

On 25.04.2016 18:06, finnconor wrote:

Slight typo {[0, s1_cand), [s1_cand, alph_s1)} should have been {[0, s1_cand), [s1_cand+1, alph_s1)}

Conor

Sent from my mobile device. On 26 Apr 2016 2:01 am, "Conor Finn" finnconor@gmail.com wrote:

Hi Michael,

Could you explain why you went for replication and integer swaps instead of partial swaps?

Perhaps I am completely misunderstanding your question here (forgive me if I am) but the algorithm never does any actual integer swaps or replications. It does however do their 'virtualised' equivalents in the probability domain which is why I have used the swap and replication terminology — I believe this choice of terminology is the source of the confusion here.

Notice that from the moment the probability distributions are first calculated, the algorithm has completely entered the probability domain. The only values which vary/change in the inner so-called "swap" loop are the joint probability distributions p(t, s1, s2) and p(s1, s2), i.e. joint_t_s1_s2_prob and joint_s1_s2_prob. Indeed a swap is performed but it is not an integer swap — rather it is a swap of probability mass (of size prob_inc).

The outer loop on the other hand decreases the size of the probability mass swapped each time (i.e. the size of the variable prob_inc). The way it does this is by repeatedly halving the size of prob_inc at each iteration of the outer loop. I believe this is the probability domain equivalent of repeatedly doubling up the empirical data in the integer domain which is why I referred to this as reps or the repetitions loop.

So that is why I mention swaps and repitition but perhaps I should more explicitly refer to them as probability mass swaps and not even mention repetitions. Indeed the the mention of repetitions will certainly go as I believe we should be doing something more intelligent than just arbitrarily halving prob_inc at each iteration in the outer loop (which is something I plan on looking into doing).

Also could you explain why you limit the case for swapping to ">=" instead of unequal?:

With the previous question I think it's my fault for being a bit loose with the terminology. But for this one I know for sure it is my fault because I accidentally deleted a comment which explained what I was doing here :)

I am using a little trick here so as to avoid wasteful random number generation. Originally I had something like

while (s1_prime == s2_cand): s1_prim = np.random.randint(0, alph_s1)

which provides an s1_prim that is not equal to s1_cand. However you are wastefully generating a lot of numbers where s1_prim does equal s1_cand (with a binary alphabet you are doing so half the time in fact).

So you can avoid this wasteful generation by using

s1_prim = np.random.randint(0, alph_s1-1) if (s1_prim >= s1_cand): s1_prim += 1

because the domain of the random number generator function is [0, alph_s1-1) (as you want since there are alph_s1-1 possible choices if you exclude the value of s1_cand) but the range is the pair of intervals {[0, s1_cand), [s1_cand, alph_s1)}, i.e. excludes the value s1_cand.

So its just a trick to try and be efficient but most importantly it is providing a not equal primed value rather than ">=" as it at first glance appears.

Let me know if there is anything else you are unsure of / have doubts about.

Conor

Sent from my mobile device. On 25 Apr 2016 11:27 pm, "mwibral" <notifications@github.com

wrote:

Hi Conor,

this looks great. Thanks a lot. Could you explain why you went for replication and integer swaps instead of partial swaps? Speed? memory? Also could you explain why you limit the case for swapping to ">=" instead of unequal?: Pick a swap candidate

  • s1_prim = np.random.randint(0, alph_s1-1)
  • if (s1_prim >= s1_cand):
  • s1_prim += 1
  • s2_prim = np.random.randint(0, alph_s2-1)
  • if (s2_prim >= s2_cand):

- s2_prim += 1

Michael

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214327201>

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub < https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-214420448 Web Bug from

https://github.com/notifications/beacon/AIqYGsy9vQzFSf7XJmRa-iH5xlLAfESTks5p7OaegaJpZM4IO3PO.gif

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub < https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216912993

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub <https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-216924313

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217126918

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217128114

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217138351

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/pwollstadt/IDTxl/pull/1#issuecomment-217144579