harvard-edge / cs249r_book

Introduction to Machine Learning Systems
https://harvard-edge.github.io/cs249r_book/
Other
1.21k stars 156 forks source link

CTGan Synthetic Data generation #297

Open emmanuel2406 opened 5 months ago

emmanuel2406 commented 5 months ago

Ex. 5.3

This is for the external Colab notebook attached for that exercise.

Suggestion: Perhaps copying this notebook and changing a few things would make it a smoother experience for the user.

I have found two things:

profvjreddi commented 5 months ago

@emmanuel2406 Thanks for reporting this issue. Would you be able to share your Colab so that we can take a look at it? @shanzehbatool if you have some time to take a look at this it would be great!

emmanuel2406 commented 5 months ago

@emmanuel2406 Thanks for reporting this issue. Would you be able to share your Colab so that we can take a look at it? @shanzehbatool if you have some time to take a look at this it would be great!

Sure, seems like it wasn't serious though. To run the cells you must just replace CTGANSyntehsizer with CTGAN. Here is a copy of the notebook https://colab.research.google.com/drive/1jUBiyhW4_PaWt0RSGk2ewnBUDrnF2d6J#scrollTo=cpV1FWHevaWO

shanzehbatool commented 4 months ago

Just saw this; I'm not able to access the Colab shared here but yes, CTGAN should work instead of CTGANSyntehsizer. And would need to directly obtain the dataset from Synthea. If the Colab shared works, then great, else I could also look into an alternative.