zhjohnchan / R2GenCMN

[ACL-2021] The official implementation of Cross-modal Memory Networks for Radiology Report Generation.
Apache License 2.0
77 stars 7 forks source link

supplementary code/data #3

Closed kevkid closed 1 year ago

kevkid commented 2 years ago

Hi, thank you for this interesting work. I would like to know if there are the models weights available and if there is a script to convert the xml files from the datasets (IU dataset) to an annotation.json file?

Are these the weights?: https://github.com/cuhksz-nlp/R2GenCMN/blob/main/data/r2gencmn.md

here is some code I wrote:

import pandas as pd
from os import listdir
from os.path import isfile, join
import xmltodict
from sklearn.model_selection import train_test_split
SEED = 0
path = 'path/to/ecgen-radiology/'
annotationFiles = [path+f for f in listdir(path) if isfile(join(path, f))]
train, tmp = train_test_split(annotationFiles, test_size=0.3, random_state=SEED)
val, test = train_test_split(tmp, test_size=0.66, random_state=SEED)
train_lst = []
val_lst = []
test_lst = []

def get_ann_(lst, subset, subset_lst):
    for file in lst:
        with open(file) as xml_file:
            data_dict = xmltodict.parse(xml_file.read())
            if 'parentImage' not in data_dict['eCitation']:
                continue
            #print(data_dict['eCitation']['parentImage'])#[0]['@id'])
            if type(data_dict['eCitation']['parentImage']) == list:
                img_paths = [x['@id']+'.png' for x in data_dict['eCitation']['parentImage']]
            else:
                img_paths = [data_dict['eCitation']['parentImage']['@id']]
            ann_id = img_paths[0].replace('.png', '')
            report = ' '.join([x['@Label'] + " " + x['#text'] if '#text' in x else '' for x in data_dict['eCitation']['MedlineCitation']['Article']['Abstract']['AbstractText']])
            annotation = {'id': ann_id, 'report':report, 'image_path': img_paths, 'split':subset}
            subset_lst.append(annotation)
get_ann_(train, 'train', train_lst)
get_ann_(val, 'val', val_lst)
get_ann_(test, 'test', test_lst)
res = {'train': train_lst, 'val': val_lst, 'test':test_lst}
Otiss-pang commented 1 year ago

Excuse me, do you know how to convert the mimic-reports in mimic-cxr into the form of annotation.json? If possible, can you provide the code?

zhjohnchan commented 1 year ago

Hi,

Thanks for your attention!

You can apply the dataset here with your license of PhysioNet.

Best, Zhihong