wiheto / teneto

Temporal Network Tools
GNU General Public License v3.0
85 stars 26 forks source link

Error in tutorial : inconsistent TemporalNetwork.network type? #59

Open maximelucas opened 4 years ago

maximelucas commented 4 years ago

Copy pasting the first few lines of the tutorial https://teneto.readthedocs.io/en/latest/tutorial/networkrepresentation.html#temporalnetwork-object yields an error at tnet.network.head() because tnet created with tnet.generatenetwork('rand_binomial',size=(5,3), prob=0.5) is a Numpy ndarray and not a Pandas DataFrame.

I haven't tried other ways of creating tnet, so I'm not sure if tnet.network is consistently a numpy array or not. If so, the tutorial should be updated, otherwise it should be made so that tnet.network is always the same type: either numpy array or pandas DataFrame.

wiheto commented 4 years ago

Nice catch! This is just an outdated tutorial. Thanks for flagging this.

The issue here is that modifications were made to only use pandas arrays when the network/connections are sparse (I think if less than 25% of the connections are non-zero). This random network fills that criteria so it makes it a numpy array.

So I am creating a little to do list for myself here:

maximelucas commented 4 years ago

That makes sense! Setting prob=0.2 yields a Pandas DataFrame. Thanks for the explanation.

I'm guessing you made that choice to optimise the code in some way. I'm just thinking, if I write some general code, and then run it on some different datasets, some might be a bit more or a bit less sparse than your threshold. Then, the same code might run or not run, or just yield different results, because of this. Does this not have the potential to create some trouble? Of course I guess I could just check the data and adapt the code every time.

wiheto commented 4 years ago

Does this not have the potential to create some trouble?

It is a good point.

  1. All teneto functions are agnostic to whether TemporalNetwork.network is a dataframe or numpy array (and most functions can take HDF dataframes as well (some parts of this are still being optimized though)).

  2. If the user want's to always have dataframe there is an argument when defining TemporalNetwork: forcesparse, which, if True, will make sure the TemporalNetwork.network be a dataframe. I have planned to make this "force_df" and "force_array" but that has been low down on the priority list (or representation=['auto','df','array'])

This behaviour is the an unfortunate trade-off I've had to make for wanting to be quick on small datasets and be able to cope with very large datasets. Trying to find one representation that is (1) familiar for a user to interact with (np or pd), (2) efficient for quick processing, (3) efficient for large networks, is something I hope to implement at one point. But for now, this was the best solution I could think of..

maximelucas commented 4 years ago

I sensed it would be something like this, it makes sense, thanks. Your point 1. should make it quite safe. In any case I don't have a better solution either, I jut got surprised because I even prob=0.3 yielded sometimes sparse and sometimes dense nets!