Quick question about "masked_block_start"

Hi,

Thanks for sharing the code! I have a quick question that in "MASS-summarization/masked_dataset.py", it seems you chose multiple spans from src_items as targets:

masked_pos = [] 
        for i in range(1, len(src_item), self.block_size):  
            block = positions[i: i + self.block_size] 
            masked_len = int(len(block) * self.mask_prob) 
            masked_block_start = np.random.choice(block[:len(block) - int(masked_len) + 1], 1)[0] 
            masked_pos.extend(positions[masked_block_start : masked_block_start + masked_len])
        masked_pos = np.array(masked_pos)

and the targets are the concat of all chosen spans. E.g., src = [1,2,3,4,mask,mask,7,8,mask,mask,11,12] and tgt = [5,6,9,10]

I want confirm if my understanding is correct since in the original paper you only chose on segment for each input. Thanks a lot!

microsoft / MASS

Quick question about "masked_block_start" #163