Overriding the CAT (-> MiADE_CAT) and MetaCAT (-> MiADE_MetaCAT) train_supervised() and train() functions to directly take MiADE's custom synthetic data csvs in the format [text, cui, name, start, end, <meta_category_name...>]
This cuts out the middleman and bypasses the need to convert to MedCATtrainer JSON, which has a lot of overhead as it's organised by documents and our synthetic data are already individual annotations - we simply need to convert them directly to MedCAT/Pytorch training data inputs.
Potentially a PR for MedCAT - for now, it gets the job done.
Overriding the
CAT
(->MiADE_CAT
) andMetaCAT
(->MiADE_MetaCAT
)train_supervised()
andtrain()
functions to directly take MiADE's custom synthetic data csvs in the format[text, cui, name, start, end, <meta_category_name...>]
This cuts out the middleman and bypasses the need to convert to MedCATtrainer JSON, which has a lot of overhead as it's organised by documents and our synthetic data are already individual annotations - we simply need to convert them directly to MedCAT/Pytorch training data inputs.
Potentially a PR for MedCAT - for now, it gets the job done.