Some questions about the Entity Extraction

xiaoman-zhang / KAD

MIT License

122 stars 10 forks source link

Some questions about the Entity Extraction #6

Closed Liqq1 closed 1 year ago

Liqq1 commented 1 year ago

Hi, thanks for your contribution. I have a question regarding Entity Extraction that I would like to confirm.

Entity Extraction processes raw reports into $t$={ $e_1^1, s_1^1, [SEP], …, e_i^k, s_i^k, [SEP]… $}, where each entity is treated independently. Does this approach lead to a loss of the relationship between lesions and their locations? E.g., ‘nodule in the left lower lung’ , in the raw report, we can clearly see the location of a nodule, but after processing, the relationship between "nodule" and its location is lost.

Thank you in advance, I am look forward to hearing from you!

Eldo-rado commented 1 year ago

Hi, I think it can be considered this way, during the training phase of the knowledge encoder, the positional(anatomy) information related to lesion has already been embedded, and connections have been established between them. Therefore, it is reasonable to treat them as independent entities here. I'm not sure if I understand correctly, and I hope you can share your perspective😊 @xiaoman-zhang

xiaoman-zhang commented 1 year ago

Thank you for sharing your perspective. Since the text encoder has been pre-trained specifically on UMLS and may be biased toward medical terminology (potentially less proficient with non-medical terms), we perform entity extraction to ensure that the input to the text encoder is the most relevant and informative term. We also find that 'entity extraction' is beneficial to the downstream tasks, so we keep this. Hopes that is helpful.

Liqq1 commented 1 year ago

Thank you for your answers, this is very helpful.