Closed lvclark closed 2 years ago
Update: I tested this on my dataset. Using the version of the script docker image, 100% of the sites were predicted to be modified. Using the docker image with a local copy of my modified script, 14% of the sites were predicted to be modified. So this seems like a major bug because the variables from the prediction set were not being correctly matched to variables from the model.
Possibly related to #83 ?
I assumed it would not make difference as the combination of features stays the same. But your results showed the opposite. So i will test it again. Thanks a lot for bringing that to my attention.
I noticed that although I gave
--columns 7,9,11
toEpinano_Predict.py
, the output file name contained the stringmis.del.q_median.MODEL
, which has the column names in a different order. I looked at the source code and noticed that the column numbers were converted a set after sorting, which randomizes the order. I don't know if this would cause a problem with prediction (I hope not but it makes me very nervous!) but I thought I would at least submit a pull request for your review.