kr-colab / diploSHIC

feature-based deep learning for the identification of selective sweeps
MIT License
50 stars 14 forks source link

What could be compared against for prediction result? #22

Closed jtxtina closed 4 years ago

jtxtina commented 4 years ago

Since there is no comparison in prediction part, I am wondering is there anything that we can adopt to compare with to get accuracy result or something similar(Like in training part)? To better evaluate the model.

andrewkern commented 4 years ago

Hi there-- if you know the true labels from a set of observations, then you can use the prediction mode with a trained classifier to assess accuracy on that set.

Does that help?

jtxtina commented 4 years ago

Yeah. But, do you have any such observations or test datasets?

jtxtina commented 4 years ago

Btw, have you ever trained with human gene datasets? Like what you did in S/HIC.

jtxtina commented 4 years ago

I am currently using Tennesse Euro training set in SHIC to train here. For calculating feature vector from simulation step, I am troubled by masking file. Could you help me with this? Thank you!

jtxtina commented 4 years ago

I have finished a successful run of training, but the val_accuracy is really low, around 0.38, anything could helpful?

andrewkern commented 4 years ago

Yeah. But, do you have any such observations or test datasets?

you should use simulated data for this

andrewkern commented 4 years ago

Btw, have you ever trained with human gene datasets? Like what you did in S/HIC.

i'm not sure what you mean.

andrewkern commented 4 years ago

I am currently using Tennesse Euro training set in SHIC to train here. For calculating feature vector from simulation step, I am troubled by masking file. Could you help me with this? Thank you!

The masking file is a fasta formated file with N's in place of the bases you wish to mask out

jtxtina commented 4 years ago

Yeah. But, do you have any such observations or test datasets?

you should use simulated data for this

Well, I mean, if there is any real dataset for this?