Evaluating dataset - Githubissues

I know that CycleGAN or GAN in general lacks an evaluation metric and therefore I thought of one idea.

I have two datasets (simulated / real) and there are (9000 / 3000) and if I split them up into training(70%) and test set (30%) I get the following split datasetA (6300 / 2700) and dataset B (2100 / 900).

1) I then trained my CycleGAN with datasetAs training set(6300) and datasetBs training set (2100).

2) After that I generated images from the datasetAs test set (2700).

3) I then trained an AlexNet with the generated (2700) images and evaluate it with the datasetBs test set (900). If the transformations are good, the accuracy should in theory be high.

The CycleGAN seems to produce fine transformations. But Im not sure if (1) 2700 images are enough to train a AlexNet and (2) if its a good evaluation metric?

Its hard to see if my AlexNet is overtraining because one CycleGAN run takes 2 days and using 5-fold would then take 10 days.

When I try this out and load a model for each epoch (1-200) the accuracy does not increase but is more like 42%, 62%, 34%, 72%, 45% etc. So it does not look like the generator are generating better transformations as it trains. Is the Learning rate to high? Or is the problem in the Alexnet? Or why is this happening?

Do you guys have some comments on this method. Does it make sense? Or should I try some other evaluation metrics?

junyanz / pytorch-CycleGAN-and-pix2pix

Evaluating dataset #824