xgi-org / xgi

CompleX Group Interactions (XGI) is a Python package for higher-order networks.
https://xgi.readthedocs.io
Other
180 stars 28 forks source link

Added the ability for XGI to load data collections #540

Closed nwlandry closed 3 months ago

nwlandry commented 4 months ago

Refactored the following to handle collections of data:

Also did the following:

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 97.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.41%. Comparing base (9355c38) to head (65606b3). Report is 48 commits behind head on main.

Files with missing lines Patch % Lines
xgi/readwrite/xgi_data.py 93.02% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #540 +/- ## ========================================== + Coverage 92.35% 92.41% +0.05% ========================================== Files 60 60 Lines 4500 4548 +48 ========================================== + Hits 4156 4203 +47 - Misses 344 345 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

maximelucas commented 4 months ago

Idea sounds nice! Can you elaborate a bit, what do you mean by collection and what's your intended workflow with this?

nwlandry commented 4 months ago

So my original intent was to figure out how to store collections similar to this one. I opened Issue 20 in XGI-DATA for this issue. The choices in my mind were (1) make a huge file containing all the hypergraphs or (2) continue to have a file for each hypergraph, but in addition have a collection file specifying metadata of the collection as well as where to find each dataset in the collection. There might be a more elegant way, but this is the best I've come up with so far. Note that with this PR, you can call xgi.load_xgi_data("hyperbard") and load the hyperbard collection.

maximelucas commented 3 months ago

Just to check, in the end you chose option (2) right? And this is implemented as a single record in Zenodo for the collection containing many .json files (one for each dataset).

xgi.load_xgi_data("hyperbard") loads a dict of datasets, so xgi.load_xgi_data("hyperbard")["coriolanus"] should load a single dataset if I understand correctly.

This is very nice and good for now thanks Nich.

As I mentioned in our call, I'm wondering if don't want to be able to access them directly from something like xgi.load_xgi_data("hyperbard-coriolanus") to be able to iterate over all datasets in XGI-data. They could also appear in the list we get with xgi.load_xgi_data() this way? This could be a future PR too.

Btw, related to #548 I don't think the loading function is in the online docs.

nwlandry commented 3 months ago

Thanks for the review! I completely agree, but this will take a bit longer to figure out how to implement. Let me know if you'd rather I wait to implement accessing/viewing individual datasets to merge or if I should open an issue on this. What do you mean when you say that the loading function isn't in the online docs?

maximelucas commented 3 months ago

Let me know if you'd rather I wait to implement accessing/viewing individual datasets to merge or if I should open an issue on this.

Yes I think we can just open an issue about this!

What do you mean when you say that the loading function isn't in the online docs?

I mean that xgi.load_xgi_data("<dataset_name>") is not documented anywhere (except from its own docstring). As far as I can tell it only appears here, but without all its arguments: https://xgi.readthedocs.io/en/stable/xgi-data.html.

nwlandry commented 3 months ago

Thanks!! I added two issues from your comments.