kr-colab / diploSHIC

feature-based deep learning for the identification of selective sweeps
MIT License
49 stars 14 forks source link

WIP: Adding Domain adaptation #51

Open bruce-edelman opened 1 year ago

bruce-edelman commented 1 year ago

!!This PR is still a WIP!!

Adding Domain Adaptation following what was done for SIA and ReLEARN from https://www.biorxiv.org/content/10.1101/2023.03.01.529396v1 (their code lives at https://github.com/ziyimo/popgen-dom-adapt)

This requires two major changes to diploshic:

That should be it for the major implementation changes. The rest of this PR is small changes to the interfacing script that handles the logic of using the original model by default and then switching to the domain adaptive model with the CLI argument --domain-adaptation

Currently by default if you turn on domain adaptation then the code assumes that you have .fvec feature vector files created from your target domain data and stored in your training directory named empirical.fvec

Current steps left undone:

bruce-edelman commented 1 year ago

Just added small fixes to the bugs you found @andrewkern -- one of the bugs was because train_test_split needs all the same length arrays input so this requires the number of your observations in emprical.fvec need to be the same as your training sets.

For the current hack of using the neut.fvec as our fake target domain data I just copied these 2000 data points 5 times to give 10000 obs to match the simulations.

With this change and a few array shape fixes the model begins training with --domain-adaptation on just fine for me now

andrewkern commented 1 year ago

running this now! one warning I'm getting is

WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available. Available metrics are: loss,predictor_loss,discriminator_loss,predictor_accuracy,discriminator_accuracy

this has to do with the metrics on the early stopping criterion.

bruce-edelman commented 1 year ago

Fixed the callback issue -- have code change from 'val_accuracy' to 'val_predictor_accuracy' for checkpointing and early stopping when using domain adaptation