xenobiolab / xenomorph

Xenomorph v1.2
MIT License
5 stars 1 forks source link

Generating model output similar to ONT kmer template #1

Closed mauriciolp closed 11 months ago

mauriciolp commented 12 months ago

Hello, I have run your script to generate the kmer model as instructed at "Generate XNA model from raw reads and a reference FASTA", which outputs a file with the following content:

,KXmer,Coverage,Mean level,Std level,Median level,Max level,Min level,KDE Mean level,KDE Std level,KDE Coverage
0,AAXC,75,-1.8290247784,0.39788658491271633,-1.86675979,0.75099738,-2.71753957,-1.8800000000000665,0.1962625756002052,71
1,AAXG,592,-0.9300495948547296,0.45546793303750077,-0.86466145,1.291167,-2.41357673,-0.7950000000000896,0.20799617944102763,473
2,ACXG,285,0.26334021782382455,0.5042956353221391,0.22222409,2.33561191,-2.35294917,0.21499999999988884,0.18765714584683457,227
3,ATAX,76,1.9251670202631574,0.7307697807221099,2.1392452750000004,2.73351133,-1.04737657,2.274999999999845,0.21959337597069623,57
4,ATCX,174,1.416326860862069,1.2674715388459408,1.98565775,3.13434778,-1.5720009,2.359999999999843,0.18200027106840422,90

However, I was wondering if it would be possible to extract a model with the same format and signal values as the template provided by ONT, for example, what can be found at file "kmers/9.4_6mers_450bps.txt":

kmer    level_mean  level_stdv  sd_mean sd_stdv weight
AAAAAA  86.486336   1.517846    0.941478    0.609357    4739.559092
AAAAAC  83.948838   1.517846    1.077051    0.745608    3403.762207
AAAAAG  85.475368   1.517846    0.953434    0.621001    3553.658043
AAAAAT  84.423907   1.517846    1.106077    0.775951    3909.663587

In other words, is it possible to convert the signals at "Mean/Std level" from file 1 to the same range of values expected at "level_mean/stdv" from file 2?

jamarchand commented 11 months ago

Hi mauriciolp. Apologies for the slow reply here, tail end of the quarter and only recently had time. Easiest way to do this is just calculating new scale and shift parameters using something like linear regression. I performed quick calculations comparing our model to the ONT one you attached.

Shift = -9.17455 Scale = 0.1003

To convert Xenomorph to ONT range: new_mean = (old_mean - shift) / scale

To convert ONT to Xenomorph range: new_mean = old_mean*shift + scale

As a good gut check, this the result of scaling our 9.4.1 4-mer ATGC model and comparing it to ONT's 9.4.1 6-mer ATGC model:

Screen Shot 2023-12-03 at 9 16 48 AM