marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

Change argument variable output_directory to data_directory #66

Closed thodrek closed 2 years ago

thodrek commented 3 years ago

What is the documentation lacking? Please describe. The variable output_directory in the input arguments is overloaded. The directory is used to include both the input and output data for a data set. Imprecise naming leads to wrong use of the system.

Describe the improvement you'd like Rename the variable to data_directory instead of output_directory

AnzeXie commented 3 years ago

For maiurs_preprocess, when preprocessing supported datasets, output_directory contains both input (downloaded data) and output data (preprocessed data), when preprocessing custom datasets, output_directory only contains output data.

shivaram commented 3 years ago

Maybe we should have a separate directory called download_directory that will store the downloaded files. I think it is confusing to put both input and output files of preprocessing in a directory called output_directory?

thodrek commented 3 years ago

@AnzeXie so for custom datasets how is the files input of general_parser set? The above is confusing and should be cleaned up. Specifically, either unify the dir to be data_dir or split things to input_dir and output_dir; the current choice is inconsistent and confusing.

AnzeXie commented 3 years ago

For custom datasets, path to files input of general_parser is set by users. For supported datasets, the files input are first downloaded then preprocessed. These downloaded files were put into the output_dir. Ok, I will add a separate directory called download_directory especially for these files downloaded for supported datasets.