Closed Changyuan0825 closed 7 months ago
Hi @Changyuan0825 , thanks for your interest in our work! The main purpose is to save disk storage. Since the time of accessing/caching node features will not be influenced by its actual content, we can verify the training time of DUCATI with random node features and avoid storing/loading them to/from disk. As you can see from the code here, you only need to store and load the adjacency data and avoid writing/reading large nfeat data to/from disk, which saves you a lot of time. However, when verifying accuracy and convergence, you need to load the real node features instead.
I get it. Thank you for your response!
I would like to ask, why do we need to use randomly generated feature vectors (i.e., fake input)? If I misunderstood, could you please tell me the meaning of the function DUCATI.CacheConstructor.separate_features_idx? The following is your raw code: