Added the ability for XGI to load data collections

xgi-org / xgi

CompleX Group Interactions (XGI) is a Python package for higher-order networks.

https://xgi.readthedocs.io

Other

180 stars 28 forks source link

Added the ability for XGI to load data collections #540

Closed nwlandry closed 3 months ago

nwlandry commented 4 months ago

Refactored the following to handle collections of data:

load_xgi_data
download_xgi_data
write_json
read_json

Also did the following:

Moved the contents of write_json to to_hypergraph_dict
Renamed dict_to_hypergraph to from_hypergraph_dict

codecov[bot] commented 4 months ago

Files with missing lines	Patch %	Lines
xgi/readwrite/xgi_data.py	93.02%	3 Missing :warning:

```diff @@ Coverage Diff @@ ## main #540 +/- ## ========================================== + Coverage 92.35% 92.41% +0.05% ========================================== Files 60 60 Lines 4500 4548 +48 ========================================== + Hits 4156 4203 +47 - Misses 344 345 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

maximelucas commented 4 months ago

Idea sounds nice! Can you elaborate a bit, what do you mean by collection and what's your intended workflow with this?

nwlandry commented 4 months ago

So my original intent was to figure out how to store collections similar to this one. I opened Issue 20 in XGI-DATA for this issue. The choices in my mind were (1) make a huge file containing all the hypergraphs or (2) continue to have a file for each hypergraph, but in addition have a collection file specifying metadata of the collection as well as where to find each dataset in the collection. There might be a more elegant way, but this is the best I've come up with so far. Note that with this PR, you can call xgi.load_xgi_data("hyperbard") and load the hyperbard collection.

maximelucas commented 3 months ago

Just to check, in the end you chose option (2) right? And this is implemented as a single record in Zenodo for the collection containing many .json files (one for each dataset).

xgi.load_xgi_data("hyperbard") loads a dict of datasets, so xgi.load_xgi_data("hyperbard")["coriolanus"] should load a single dataset if I understand correctly.

This is very nice and good for now thanks Nich.

As I mentioned in our call, I'm wondering if don't want to be able to access them directly from something like xgi.load_xgi_data("hyperbard-coriolanus") to be able to iterate over all datasets in XGI-data. They could also appear in the list we get with xgi.load_xgi_data() this way? This could be a future PR too.

Btw, related to #548 I don't think the loading function is in the online docs.

nwlandry commented 3 months ago

Thanks for the review! I completely agree, but this will take a bit longer to figure out how to implement. Let me know if you'd rather I wait to implement accessing/viewing individual datasets to merge or if I should open an issue on this. What do you mean when you say that the loading function isn't in the online docs?

maximelucas commented 3 months ago

Let me know if you'd rather I wait to implement accessing/viewing individual datasets to merge or if I should open an issue on this.

Yes I think we can just open an issue about this!

What do you mean when you say that the loading function isn't in the online docs?

I mean that xgi.load_xgi_data("<dataset_name>") is not documented anywhere (except from its own docstring). As far as I can tell it only appears here, but without all its arguments: https://xgi.readthedocs.io/en/stable/xgi-data.html.

nwlandry commented 3 months ago

Thanks!! I added two issues from your comments.

xgi-org / xgi

Added the ability for XGI to load data collections #540

Codecov Report