shrimai / Focused-Attention-Improves-Document-Grounded-Generation

MIT License
21 stars 4 forks source link

about CODR #6

Open lalisaa opened 2 years ago

lalisaa commented 2 years ago

sorry, i can't understand the code in CODR as follows, why the source_ids have to concatenate itself and then concatenate the doc_ids? in the passage , hc=encoder[ci]; hd=encoder[ci;di]; h=[hc;hd]. I didn't find a single piece of code that implemented this format

        source_ids.append(source_ids_)
        source_mask.append(source_mask_)
        source_len = max(source_len, source_len_)
       ......
       for document in documents:
            doc_tokens = self.tokenizer.tokenize(document)[:self.args.source_max_len-2]
            doc_ids_ = [config.bos_token_id] + self.tokenizer.convert_tokens_to_ids(doc_tokens) + [config.eos_token_id] # <s> ... </s>
            doc_len_ = len(doc_ids_)
            doc_mask_ = [1] * doc_len_
            .......

            source_ids.append(doc_ids_)
            source_mask.append(doc_mask_)
            source_len = max(source_len, doc_len_)