marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

Could I run C++ code for Marius? #108

Closed qhtjrmin closed 2 years ago

qhtjrmin commented 2 years ago

Hi, I built execution files (marius_train and marius_eval), using CMakeLists.txt. However, when I run this execution file to execute as in the example of github, error occurs.

$ ./marius_train examples/configuration/fb15k_237.yaml

Result: Aborted (core dumped)

Is the execution files created through CMake not working at the moment? Or is the input that should be entered differently from when running the marius python??

Thanks

rogerwaleffe commented 2 years ago

Thanks for the question!

The execution files (i.e., marius_train) should be working. I'm not sure exactly what the problem is given that error message is very general. Were there any problems when compiling Marius? Were you able to run the Python example for fb15k_237? Based on your last sentence, I'm assuming you may have had success with the Python API. If that is the case, my first guess would be that the dataset is somehow not configured correctly or in the right directory for the config interface (marius_train). Can you check that the dataset_dir in the fb15k_237.yaml file is pointing to the correct directory? You could also try preprocessing the dataset as described here before running the example.

qhtjrmin commented 2 years ago

Hi, thank you for your reply. My setting for the dataset was weird as you said. Additionally, is there any difference between the python installation version and the c++ make version? The running results are similar, but there seems to be a difference in running time. The c++ make version takes nearly twice as much time. Is there any difference??

rogerwaleffe commented 2 years ago

The Python version of Marius is just a set of Python bindings for the C++ functions (i.e., the Python functions are just wrappers over the C++ functions), so in general the same underlying code is being executed for both versions. I'm surprised the C++ version is that much slower. In theory that shouldn't be the case. My guess is that there may be a difference in some hyperparameters (e.g., batch size, number of negative samples etc.). It's also possible that a slightly different set of C++ functions are being used for each version based on differences in the two settings.