Open jay-bhambhani opened 1 year ago
happy to contribute to this one in any way possible!
Thanks for starting this issue. Relevant slack discussion: https://torchgeometricco.slack.com/archives/C01DN0B3B1N/p1693220997281019?thread_ts=1692902721.218399&cid=C01DN0B3B1N
I see people are using lmdb
for this, see https://github.com/Open-Catalyst-Project/ocp/blob/main/ocpmodels/datasets/lmdb_database.py. It would be pretty cool to add such an option to PyG's datasets.
Hi Matthias! Thank you so much for this. My team and I would love to take this on - however we've been asked if we might be able to discuss this a bit more with you so we can scope out and contribute. Would you potentially have some time next week to discuss? We are more than happy to work around you schedule, so just let us know!
A couple of questions off the bat. We love the idea of a memory mapped file - is there any interest in potentially adding a db?
Do we assume that this will tie into the featurestore and graphstore abstractions that already exist? In theory I know that we could also store features in the same database if we are using something more like a generic kv or rdbms.
Thanks for all of you support and guidance with this! I know we are extremely excited to contribute to this project!
Sure, we can discuss. What timezone are you in? I am in Europe.
For DB integration: I think I would implement this in a separate interface, and then do the integration in torch_geometric.data.Dataset
. There is follow-up opportunity to actually implement a FeatureStore
with it, but I wouldn't tie them necessarily together.
We are in the US Eastern time zone - so Iām sure we can find a time that works for both of us!
Thanks for the suggestions! Looking forward to chatting soon!
On Fri, Sep 1, 2023 at 11:56 AM Matthias Fey @.***> wrote:
Sure, we can discuss. What timezone are you in? I am in Europe.
For DB integration: I think I would implement this in a separate interface, and then do the integration in torch_geometric.data.Dataset. There is follow-up opportunity to actually implement a FeatureStore with it, but I wouldn't tie them necessarily together.
ā Reply to this email directly, view it on GitHub https://github.com/pyg-team/pytorch_geometric/issues/7946#issuecomment-1702970534, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ2LDUEONYBUVQIIO4U77TXYIATZANCNFSM6AAAAAA4BWGVDA . You are receiving this because you authored the thread.Message ID: @.***>
Does 4PM CEST on Thursday work for you? You can send an invite to matthias@kumo.ai.
š The feature, motivation and pitch
We would like to be able to enhance dataloaders to specifically handle the case of loading large volumes of small graph data. Currently, PyG is primarily able to handle only large, highly connected graph data.
Alternatives
Currently, we can do this via the dataset, but a lot of our data will not fir into memory
Additional context
No response