marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

Performance Improvements to Main #131

Closed rogerwaleffe closed 1 year ago

rogerwaleffe commented 1 year ago

The main branch is missing several opportunities for performance improvements (some of which were added to the EuroSys 2023 MariusGNN artifact after the two code bases diverged to add additional functionality and open-source main). This pull request adds these performance improvements to the main branch, bringing the performance equal to or better than the version that generated the numbers reported in the MariusGNN paper.

The main performance improvements are with respect to: 1) improving the pipeline performance with the use of cuda streams, 2) improving the data loading and transfer performance through the use of pinned memory, and 3) improving the sampling performance with an improved implementation for constructing DENSE.

Local testing shows roughly 6x improvement over the existing main branch on OGBN-Papers100M for node classification, bringing the runtime equal to the MariusGNN paper. Improvements on Freebase86M for link prediction are nearly 3x compared to the existing main branch, resulting in an improvement of almost 2x over the MariusGNN paper.

This pull request is being created initially with only modifications to the cpp. Remaining TODOs before merging are as follows:

  1. update the config parsing to ensure backwards compatibility? (this PR changes the config file format slightly)
  2. update the python bindings as needed to match the API updates (API updates are generally minor and isolated to function arguments)
  3. update the docs, examples, tests etc. based on API/config changes if needed
  4. run the auto formatting and fix any issues etc. in the GitHub actions (e.g., build issues due to torch versioning etc.)
thodrek commented 1 year ago

Impressive results!

Sent from my iPhone

On 20 Jan 2023, at 00:59, Roger Waleffe @.***> wrote:



The main branch is missing several opportunities for performance improvements (some of which were added to the EuroSys 2022 MariusGNN artifact after the two code bases diverged to add additional functionality and open-source main). This pull request adds these performance improvements to the main branch, bringing the performance equal to or better than the version that generated the numbers reported in the MariusGNN paper.

The main performance improvements are with respect to: 1) improving the pipeline performance with the use of cuda streams, 2) improving the data loading and transfer performance through the use of pinned memory, and 3) improving the sampling performance with improved implementation for constructing DENSE.

Local testing shows roughly 6x improvement over the existing main branch on OGBN-Papers100M for node classification, bringing the runtime equal to the MariusGNN paper. Improvements on Freebase86M for link prediction are nearly 3x compared to the existing main branch, resulting in an improvement of almost 2x over the MariusGNN paper.

This pull request is being created initially with only modifications to the cpp. Remaining TODOs before merging are as follows:

  1. update the config parsing to ensure backwards compatibility? (this PR changes the config file format slightly)
  2. update the python bindings as needed to match the API updates (API updates are generally minor and isolated to function arguments)
  3. update the docs, examples, tests etc. based on API/config changes if needed
  4. run the auto formatting and fix any issues etc. in the GitHub actions (e.g., build issues due to torch versioning etc.)

You can view, comment on, or merge this pull request online at:

https://github.com/marius-team/marius/pull/131

Commit Summary

File Changes

(31 fileshttps://github.com/marius-team/marius/pull/131/files)

Patch Links:

— Reply to this email directly, view it on GitHubhttps://github.com/marius-team/marius/pull/131, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAH6W3ZZZBU3ASOYECTNC7TWTHIOVANCNFSM6AAAAAAUA6CGRI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

[ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/marius-team/marius/pull/131", "url": "https://github.com/marius-team/marius/pull/131", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]