haseebs / OWE

Pytorch code for An Open-World Extension to Knowledge Graph Completion Models (AAAI 2019)
https://aaai.org/ojs/index.php/AAAI/article/view/4162
37 stars 9 forks source link

Process killed during dataload #11

Closed mdabedr closed 3 years ago

mdabedr commented 3 years ago

During data loading for a large KG (900k+) triples in data.py

def init_labels(self) -> None:

other code

    self.head_labels = {k: to_one_hot(t, self.train.num_entities) for k, t in all_heads.items()}
    self.tail_labels = {k: to_one_hot(t, self.train.num_entities) for k, t in all_tails.items()}

I have around 1M heads and tails items combined. My guess is it getting targeted by the OOM Killer. Any ideas why?

haseebs commented 3 years ago

It should be because you dont have enough memory if this happens only on a large KG. Try with a smaller KG.

mdabedr commented 3 years ago

I looked at the one hot function, each one-hot encoding should be around 64 bytes. With 1M entities, it still should not overflow the memory. Is there any possibility of memory leaks??

mdabedr commented 3 years ago

Just to clarify, I tried running it with a smaller KG with 242k triples and pickle dumped the head_labels and tail_labels. They each take 43GB and 90 GB respectively and tends to take hundred of GBs of RAM space. Is it supposed to take this much or is there any memory leaks happening? I tried using summarywriter at multiple places and tried to find a possible memory leak. Am I missing something or is the dictionary supposed to be this big??

haseebs commented 3 years ago

I am not sure what the issue is. But I don't think there were any memory leaks when I experimented with the code. You could try running some kind of memory profiler and also check the dimensions of the models during runtime.

It is not supposed to take this much space.

mdabedr commented 3 years ago

I tried using pympler and it does not show anything substantial. The code stagnates with a dataset with 1M triples and for a smaller dataset with 300k triples, the aforementioned two dictionaries take too much space. Am I missing something?? I can share the KG with you if you want to take a look.

Also, the model stagnates at loading the dataset, max overhead being when the dictionary is being computed.

haseebs commented 3 years ago

Sorry for the late reply.

Yes, the loading part at the beginning of the training is slow. It could be further optimized.

Did you try the datasets that are used in the paper? There shouldn't be any problems with them. You should try to see what exactly is the difference between those datasets and your is, and you might be able to figure out the issue.