pyg-team / pytorch-frame

Tabular Deep Learning Library for PyTorch
https://pytorch-frame.readthedocs.io
MIT License
499 stars 52 forks source link

Integration with TorchData #329

Open nimaous opened 8 months ago

nimaous commented 8 months ago

Hi all,

In my project, I use TorchData to read parquet files from AWS S3 buckets. Currently, it seems that pytorch-frame can not be integrated with torchdata. I was wondering if you have any plans to make it possible or if you have any workaround solution to read parquets files from S3 buckets using torchframe dataset?

Thanks,

yiweny commented 8 months ago

It seems that TorchData is no longer under active development. Not sure if we have plans to integrate with it on our side.

If you can load data stored in the parquet files into a Pandas Dataframe, you can create a DataLoader using torch_frame.data.DataLoader by directly supplying the dataframe as the dataset argument. However, pandas DataFrame can be memory intensive. So you might run into issues with large datasets.

We do welcome community contribution.