When using FUSION weight files, ctwas reads the "top1" model differently than FUSION

shugamoe commented 1 year ago

In read_weight_fusion(), as called by impute_expr_z() , if method="top1" or method="best" is used and the "top1" model is read, read_weight_fusion() simply selects all non-zero SNP weights. This is not what happens in FUSION's z score imputation when reading the top1 model.

From ctwas_read_data() (L271-283) we see that the "top1" column of wgt.matrix would be selected in its entirety, removing only zero weights.

      g.method = method
      if (g.method == "best") {
        g.method = names(which.max(cv.performance["rsq",]))
      }
      if (exists("cv.performance")){
        if (!(g.method %in% names(cv.performance[1,]))){
          next
        }
      }
      wgt.matrix <- wgt.matrix[abs(wgt.matrix[, g.method]) > 
                                 0, , drop = F]
      wgt.matrix <- wgt.matrix[complete.cases(wgt.matrix), 
                               , drop = F]

In FUSION's equivalent of ctwas::impute_expr_z(), when it selects the top1 model, it zeros out all other weights but the one with the max squared value, resulting in a model using only a single SNP.

    # if this is a top1 model, clear out all the other weights
    if ( substr( (colnames(cv.performance))[ mod.best ],1,4) == "top1" ) wgt.matrix[ -which.max(wgt.matrix[,mod.best]^2)  , mod.best] = 0

There are FUSION weights available online where the wgt.matrix variable in the *.wgt.RDat files look like the example below:

When FUSION reads the top1 model it results in a single SNP model, but with ctwas, selecting the top1 model results in the inclusion of weights not used in FUSION for the zscore calculation. To avoid this, ctwas should adopt a similar measure to FUSION when reading the "top1" model from wgt.matrix, and keep only the highest squared value weight.

wesleycrouse commented 1 year ago

Hi @shugamoe,

The method option that you described was actually just meant to select among alternative pre-computed prediction models that can be supplied in the FUSION format. One of the FUSION GTEx v7 weights was named "top1" and that is the reason for this naming in the options. Using the "top1" method with FUSION formatted weights in cTWAS doesn't make any changes to the prediction model, but rather just looks for a method named "top1" and uses that one, and this option is ignored using PredictDB format weights because these can only supply a single prediction model.

I'm in the process of updating all of the documentation, and I will make this behavior more clear. I'm sorry for the confusion.

shugamoe commented 1 year ago

@wesleycrouse. I have posted code where FUSION does make changes to the prediction model as it reads it when computing z scores. It only applies when using the top1 model. Please look closer at my initial post, newly edited as of yesterday, and address the difference in the FUSION zscore calculation when "selecting" the top1 model as compared to ctwas.

wesleycrouse commented 1 year ago

@shugamoe I did not mean that the FUSION code does not make changes to the prediction model, but rather that we did not intend to make changes to the prediction model when processing FUSION weights in our code, only select from what was already made available. I've modified my previous response to be more clear.

I'm not sure if we want to introduce options for modifying prediction models internally within our software. As I mentioned, this option was intended to just select between models that were already supplied, one of which happened to be named "top1" in FUSION v7. I will discuss with my other co-author if this is something that we want to implement.

Thank you for bringing all of this to our attention.

shugamoe commented 1 year ago

@wesleycrouse if you intend for the zscore calculation ctwas makes when selecting the top1 model to be correct, you will make the changes. If you do not, either ctwas or FUSION are calculating zscores based off of the top1 model in FUSION weight files wrong and my money is on ctwas.

Happy to bring it to your attention, and thank you for getting back to me!

wesleycrouse commented 1 year ago

@shugamoe almost all of my work has been developing and using the PredictDB formatted weights, and my memory of the FUSION weights was not correct. I just double checked the FUSION weights, and you are correct, they are not formatted as I thought. I thought that it was a vector with a single non-zero weight. Sorry for the pushback. We will make the correction.

Do you happen to know why the top1 is formatted in this way? It seems inconsistent with the behavior for the other models

shugamoe commented 1 year ago

@wesleycrouse, thanks!

Unfortunately I don't have any insight as to why the top1 is formatted this way.

I agree with you that it's inconsistent behavior with the other models, as they don't even report cross validation $R^2$ or pvals corresponding to the individual weights (single SNP models) for the top1 model.

(Would $R^2$ make a better selection criteria than just the weight magnitude itself if the highest $R^2$ from an individual weight model in the "top1" column wasn't from the highest magnitude weight? It seems weird that cross validation $R^2$ is used to define the "best" model, but when selecting a single SNP model that criteria appears to be the weight magnitude.)

wesleycrouse commented 1 year ago

@shugamoe I initially misunderstood what you meant which was part of the confusion. I thought that you were expecting to specify an arbitrary FUSION format prediction model and specify the "top1" method, and then and have it select the only the variant with largest effect from that prediction model. That behavior could be useful but wasn't something that we wanted to support. What I didn't realize was that this was actually how the "top1" model needs to be processed... Again, thank you for bringing this to our attention

wesleycrouse commented 1 year ago

@shugamoe I am going to be making some changes to the workflow on the project site. We pushed some changes that we'd been using internally and there are inconsistencies on the project site. I'll ping you once this is done, and please let us know if you encounter any more issues.

xinhe-lab / ctwas

When using FUSION weight files, ctwas reads the "top1" model differently than FUSION #4