ococrook / hdxstats

hdxstats: An R-package for statistical analysis of hydrogen deuterium exchange mass-spectrometry data.
Apache License 2.0
5 stars 2 forks source link

Residue start and end in ComputeAverageMap function #42

Closed nlgittens closed 1 year ago

nlgittens commented 1 year ago

Whilst trying to troubleshoot an issue with the signed p-values in the hdxdifference function, I noticed an issue with two peptides (one protected and one deprotected) that had a 2 residue overlap, and the overlapping residues had distinct p-values. Actually, these two peptides should be treated as non-overlapping regions, if we assume the first two residues of a peptide cannot retain deuterium due to back-exchange. I hadn't noticed this before because I had quite discrete regions of protection and deprotection.

I might be able to translate this issue to the HOIP dataset to explain. The first peptide in HOIP covers residues 1-7. image

If I open up the R file and write the computeAverageMap function manually, I get the following peptide map:-

start [1] 0 5 5 5 6 6 6 7 7 7 7 7 10 10 10 10 10 18 25 25 26 32 33 33 40 40 [27] 40 41 41 41 48 48 49 51 51 73 75 76 76 76 76 76 84 85 87 88 97 97 97 112 113 116 [53] 120 120 131 131 131 149 150 150 150 150 153 154 168 169 169 169 170 170 183 183 186 186 186 187 193 195 [79] 212 223 223 246 246 246 247 247 247 249 249 252 260 260 260 260 304 304 307 313 315 316 332 333 333 335 [105] 335 335 337 337 338 360 end [1] 7 18 21 25 18 21 25 18 20 21 24 25 18 20 21 23 25 25 32 33 33 40 40 41 48 49 [27] 51 49 51 65 61 65 65 61 65 84 84 84 85 87 88 97 97 97 97 97 104 106 108 120 120 131 [53] 131 132 149 150 153 169 162 167 169 170 169 169 183 183 185 186 183 186 195 208 195 208 209 208 208 208 [79] 223 238 239 257 259 260 257 259 260 257 260 260 275 276 278 281 313 315 315 327 327 327 356 356 357 356 [105] 357 360 357 360 360 379

So peptide 1 is actually 0-7.

In fact, we want peptide 1 to be 3-7 in the residue map, if we assume that the first two residues of a peptide cannot retain deuterium, as is known in the literature.

I can force the desired sequence like this (though I don't know if this is the most robust for your script).

start <- sapply(allPatterns, function(x) x@start) + 2 end <- start + sapply(allPatterns, function(x) x@width) - 3 peptideMap <- matrix(0, ncol = length(AAString), nrow = ncov + 3) start [1] 3 8 8 8 9 9 9 10 10 10 10 10 13 13 13 13 13 21 28 28 29 35 36 36 43 43 [27] 43 44 44 44 51 51 52 54 54 76 78 79 79 79 79 79 87 88 90 91 100 100 100 115 116 119 [53] 123 123 134 134 134 152 153 153 153 153 156 157 171 172 172 172 173 173 186 186 189 189 189 190 196 198 [79] 215 226 226 249 249 249 250 250 250 252 252 255 263 263 263 263 307 307 310 316 318 319 335 336 336 338 [105] 338 338 340 340 341 363 end [1] 7 18 21 25 18 21 25 18 20 21 24 25 18 20 21 23 25 25 32 33 33 40 40 41 48 49 [27] 51 49 51 65 61 65 65 61 65 84 84 84 85 87 88 97 97 97 97 97 104 106 108 120 120 131 [53] 131 132 149 150 153 169 162 167 169 170 169 169 183 183 185 186 183 186 195 208 195 208 209 208 208 208 [79] 223 238 239 257 259 260 257 259 260 257 260 260 275 276 278 281 313 315 315 327 327 327 356 356 357 356 [105] 357 360 357 360 360 379

Likewise, this will be the same for the plotEpitopeMapResidue and hdxdifference functions. For plotEpitopeMap with the redundancy plot, I would preserve the native sequence.

ococrook commented 1 year ago

You're right I kinda of skipped this for visualisations purposes but I can fix this because it could causes confusion. Should we also skip over prolines?