stanfordmlgroup / chexpert-labeler

CheXpert NLP tool to extract observations from radiology reports.
MIT License
328 stars 78 forks source link

How to mine more labels #28

Closed chinmay5 closed 3 years ago

chinmay5 commented 3 years ago

Hi, I am trying to implement the paper Multi-Label Learning With Visual-Semantic Embedded Knowledge Graph for Diagnosis of Radiology Imaging. The paper mines extra labels from the chexpert reports. I want to do the same but I am not sure how to do that exactly.

For instance, lungs and ribs are amongst the extra labels that they mine. If I need to replicate the result, can someone please guide me on how to do it?

jirvin16 commented 3 years ago

Hi, I am trying to implement the paper Multi-Label Learning With Visual-Semantic Embedded Knowledge Graph for Diagnosis of Radiology Imaging. The paper mines extra labels from the chexpert reports. I want to do the same but I am not sure how to do that exactly.

For instance, lungs and ribs are amongst the extra labels that they mine. If I need to replicate the result, can someone please guide me on how to do it?

Hi - I'm not sure exactly what they did in their paper, but you can do this in this repository by doing the following:

  1. Add mention and unmention txt files to the corresponding folders here. The former tells the labeler which strings in the report to match, and the latter tells which strings which contain the mention strings to unmatch (would try looking at some examples and if you have any questions please let me know)!
  2. If you have any custom negation/uncertainty patters for the new labels, can add them to the corresponding file here
  3. Add the label to the categories list here. The label should match the name of the mention/unmention files, where underscores are replaced with spaces and converted to title case (logic here)
chinmay5 commented 3 years ago

Add mention and unmention txt files to the corresponding folders here. The former tells the labeler which strings in the report to match, and the latter tells which strings which contain the mention strings to unmatch (would try looking at some examples and if you have any questions please let me know)!

Please forgive my ignorance but in order to know what tokens would be part of the above-mentioned files, do we require some domain knowledge? Are these tokens generated by some clinical expert or there are tools based on ideas of topic modeling that could do the same for us?

jirvin16 commented 3 years ago

Add mention and unmention txt files to the corresponding folders here. The former tells the labeler which strings in the report to match, and the latter tells which strings which contain the mention strings to unmatch (would try looking at some examples and if you have any questions please let me know)!

Please forgive my ignorance but in order to know what tokens would be part of the above-mentioned files, do we require some domain knowledge? Are these tokens generated by some clinical expert or there are tools based on ideas of topic modeling that could do the same for us?

Yes, in our work we worked with radiologists to determine appropriate phrases for each category.