marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

Spark preprocessor now works with s3 #118

Open basavaraj29 opened 1 year ago

basavaraj29 commented 1 year ago

expects env variables S3_BUCKET, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

marius_preprocess --edges s3a://fb15k237/train.txt s3a://fb15k237/valid.txt s3a://fb15k237/test.txt --output_directory /home/data/datasets/fb15k_237/ --spark

writes preprocessed output to the local directory as well as s3, can delete the files from local, but keeping them for now.