uclh-criu / miade

A set of tools for extracting formattable data from clinical notes stored in electronic health record systems.
https://uclh-criu.github.io/miade/
Other
2 stars 0 forks source link

Enable train with custom synthetic data csvs #80

Closed jenniferjiangkells closed 1 year ago

jenniferjiangkells commented 1 year ago

Overriding the CAT (-> MiADE_CAT) and MetaCAT (-> MiADE_MetaCAT) train_supervised() and train() functions to directly take MiADE's custom synthetic data csvs in the format [text, cui, name, start, end, <meta_category_name...>]

This cuts out the middleman and bypasses the need to convert to MedCATtrainer JSON, which has a lot of overhead as it's organised by documents and our synthetic data are already individual annotations - we simply need to convert them directly to MedCAT/Pytorch training data inputs.

Potentially a PR for MedCAT - for now, it gets the job done.

jenniferjiangkells commented 1 year ago

FYI, I've yet to test if this works with our data on the gae, might just do a quick copy and paste to see it works before merging...