Sampling issue while running main file

h3lio5 / episodic-lifelong-learning

Implementation of "Episodic Memory in Lifelong Language Learning"(NeurIPS 2019) in Pytorch

MIT License

56 stars 8 forks source link

Sampling issue while running main file #1

Closed kumar-shridhar closed 3 years ago

kumar-shridhar commented 4 years ago

Hi, I am running this work and I found a couple of minor mistakes (that was taken care of) and a major issue with the sampling process. While sampling the dictionary data structure, what should be the output?

Line 93 of the main.py

content, attn_masks, labels = memory.sample(sample_size=64)

I sampled from the dictionary items and I am getting 4 tuples as output while expected output should be: content, attn_mask, label. Let me know if my sampling procedure matches the desired one. I am neglecting the fourth tuple altogether.

Thanks!

JingYannn commented 4 years ago

Hi, I tried to add a function "sample" in the class ReplayMemory of model MbPAplusplus, but I have some problems with this new function. Did you add a function "sample" either? Could you please share your code? THANKS~

kumar-shridhar commented 4 years ago

I used this sample function:

def sample(self, sample_size=100):
        """
        Parameter:
        S : number of examples to sample from replay buffer
        Returns:
        tuple of S number of text content and their corresponding attention_masks and labels
        """
        contents = []
        attn_masks = []
        labels = []

        samples = random.sample(self.memory, sample_size)

        for content, attn_mask, label in samples:
            contents.append(content)
            attn_masks.append(attn_mask)
            labels.append(label)

        return (torch.LongTensor(np.asarray(contents)), torch.LongTensor(np.asarray(attn_masks)), torch.LongTensor(np.array(labels)))

luotuoqingshan commented 4 years ago

It seems we need to change the line

samples = random.sample(self.memory, sample_size)

samples = random.sample(list(self.memory.values()), sample_size)

Otherwise it will report the following error:

 1 Traceback (most recent call last):
  2   File "main.py", line 268, in <module>
  3     train(args.order, model, memory)
  4   File "main.py", line 93, in train
  5     content, attn_masks, labels = memory.sample(sample_size=64)
  6   File "/home/yhuang704/episodic-lifelong-learning/models/MbPAplusplus.py", line 91, in sample
  7     samples = random.sample(self.memory, sample_size)
  8  File "/nethome/yhuang704/anaconda3/envs/continuallearning/lib/python3.8/random.py", line 359, in sample
  9     raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
 10 TypeError: Population must be a sequence or set.  For dicts, use list(d).

constant5 commented 3 years ago

I implemented it this way:

def sample(self, sample_size):
        keys = random.sample(list(self.memory),sample_size)
        contents = np.array([self.memory[k][0] for k in keys])
        attn_masks = np.array([self.memory[k][1] for k in keys])
        labels = np.array([self.memory[k][2] for k in keys])
        return (torch.LongTensor(contents), torch.LongTensor(attn_masks), torch.LongTensor(labels))

But then I got an out of memory error when going into the REPLAY block. I fixed by decreasing the sample size to 32 (same as batch_size).

h3lio5 commented 3 years ago

I think this issue has been solved. Thanks for the help @constant5 .