Closed gminorcoles closed 10 months ago
This is an interesting use case . Did you get any thread ?
I have found some tutorials and examples showing how to add conditional fine tuning to pretained gpt-2, so I think I have a handle on how to do it. there are a lot of layers of code in this project, however, so its a bit of work to do. Also I am looking at breaking out the Trainer so that is is persistent across fit() calls, so that I can train in batches on lots of data, but the trainer is also embedded in a lot of nested code. I was working on a different approach to generating data using diffusion models before I found this project and I need to test that more before moving to this.
Okay, cool. I was thinking of a different use case .
Hi I found your work today I think from googling about overfitting and data copying in these kinds of models. There are some very interesting ideas re: DCR and the Q metric that I think are pretty interesting. I have a need to generate data conditionally. I have some labels for medical imaging data and I need to sample using the timestamp and a label as the conditional information.
Could I add the conditional data to the text of the input? Is this a use case which anyone has explored?
thanks
Hello @gminorcoles , thanks for looking into this project! Conditional generation is possible provided that your conditioning variables are located at the first columns of your table.
There is a seed_input
argument that can then be used to conditionally sample given the values for the first N columns that is provided.
Indeed, the code is a bit convoluted currently. I need to find more time to refactor this and make things simpler! 😅
But any contributions are welcome! And let me know if you have any questions. 😊
Hi I found your work today I think from googling about overfitting and data copying in these kinds of models. There are some very interesting ideas re: DCR and the Q metric that I think are pretty interesting. I have a need to generate data conditionally. I have some labels for medical imaging data and I need to sample using the timestamp and a label as the conditional information.
Could I add the conditional data to the text of the input? Is this a use case which anyone has explored?
thanks