Don't jumble column order in Epinano_Predict

novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)

GNU General Public License v2.0

112 stars 31 forks source link

Don't jumble column order in Epinano_Predict #119

Closed lvclark closed 2 years ago

lvclark commented 2 years ago

I noticed that although I gave --columns 7,9,11 to Epinano_Predict.py, the output file name contained the string mis.del.q_median.MODEL, which has the column names in a different order. I looked at the source code and noticed that the column numbers were converted a set after sorting, which randomizes the order. I don't know if this would cause a problem with prediction (I hope not but it makes me very nervous!) but I thought I would at least submit a pull request for your review.

lvclark commented 2 years ago

Update: I tested this on my dataset. Using the version of the script docker image, 100% of the sites were predicted to be modified. Using the docker image with a local copy of my modified script, 14% of the sites were predicted to be modified. So this seems like a major bug because the variables from the prediction set were not being correctly matched to variables from the model.

lvclark commented 2 years ago

Possibly related to #83 ?

Huanle commented 2 years ago

I assumed it would not make difference as the combination of features stays the same. But your results showed the opposite. So i will test it again. Thanks a lot for bringing that to my attention.