rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library
https://docs.rapids.ai/api/cugraph/stable/
Apache License 2.0
1.69k stars 301 forks source link

[DEBT] Input Batch IDs Don't Line Up With Output Batch IDs #3794

Open alexbarghi-nv opened 1 year ago

alexbarghi-nv commented 1 year ago

3789 resolved an issue where empty minibatches were dropped from the bulk sampler. The fix for this problem results in batch ids that may not match up with those provided as input. This is not an issue for cuGraph-DGL and cuGraph-PyG since both packages expect only the number of batches to match what is specified by the filename, which renumbering the remaining non-empty minibatches does.

However, in the case of debugging future bulk sampling issues, or if batch ids become important to end-users, this could cause issues. Ultimately, there should be a way to better handle empty batches, possibly just returning the input seeds (which may better line up with end user expectations), or some other solution.

alexbarghi-nv commented 7 months ago

Related to #4201