matt-gardner / pra

122 stars 42 forks source link

RAM issues when extracting features #27

Open arthurcgusmao opened 6 years ago

arthurcgusmao commented 6 years ago

Hi,

I am running into RAM issues when performing the CreateMatrices operation for larger graphs and for more expressive features (i.e., going beyond PRA-like features) using SFE.

I observed that, from the time I start running the code to its end, RAM usage only tends to increase, independently of the current relation being processed. Interestingly, if I quit the execution and restart the code a second time from the last relation that the first run couldn't handle, the code is able to follow through with a number of new relations before again using SWAP space (and then, since everything slows down heavily I am forced to quit and restart a third time, and so on and so forth). I am able to process all relations in this manner, but it is not ideal.

Thus, I am trying to understand the mechanism responsible for that. In my (very humble) understanding, this could be due to the following reasons:

  1. Regarding the Operation instance: Every time the code processes each relation, a new FeatureGenerator instance is created, where features are stored. Maybe this generator is not being cleaned from memory after each relation is run and this is what causes the RAM usage to explode as more and more relations are processed.
  2. Another object (maybe split or graph ?) increases its size as more and more features are extracted. Maybe each subgraph created for each node pair is not being cleaned from memory, or something along these lines.

Notice that the reasons mentioned above are just speculations so far since I am still getting familiar with the code, but they seem to make sense given its behavior.

Any ideas on this will help.

matt-gardner commented 6 years ago

Yeah, this sure sounds like something is not getting garbage collected, but your guess is as good as mine here, unfortunately.