kathrinse / be_great

A novel approach for synthesizing tabular data using pretrained large language models
MIT License
276 stars 46 forks source link

Reg. Fine tuning for Time Series data generation #43

Open Divjyot opened 10 months ago

Divjyot commented 10 months ago

With time series data, the challenges I found model face is to understand that change in label (binary) becomes important point. For healthcare, use-case such as diagnosis of disease and data with timeline, the detection / label change from 0->1 is not irreversible (typically no records of vitals of patient after a patient is tested positive.)

One question I have is, is there a way to make a LLM understand time series / collection of records and then able to sample a time series collection of records ? I have tried to condition it with some fixed demographic values such as an identifier value, age, multiple Timestamps, however I am not convinded that I am getting a synthetic collection for those given fixed variables at different timestamps (sampling via great_sample 's starting_prompt )

Any ideas?