Results differ unexpectedly between `pipeline.predict` and `query_bed` with `ref` score

an1lam commented 4 years ago

I'm using the DeepSEA/variantEffects model with MutationMap to try and find the most impactful mutations for a set of sequences. However, I've noticed a discrepancy between the predictions I get for the wild-type sequences when using pipeline.predict vs. query_bed with the ref score.

The pipeline.predict scores are generally high probabilities for CTCF in cell type A549, which is what I'd expect given that my bed file consists of ChIP-seq peaks for that TF/cell line pair. The ref scored predictions, on the other hand, tend to be really small. A few examples of the difference are:

Predictions from pipeline.predict: [0.96582216 0.03907571 0.09172686 0.19638198 0.13311383 0.64790106
 0.67463433 0.94372396 0.64945625 0.98188872]

        vs.

Predictions from `ref`-scored MutationMap.query_bed: [-9.167706593871117e-09, -3.85800376534462e-07, 2.0929292077198625e-08, 3.2794196158647537e-07, 3.2247044146060944e-08, 2.3366883397102356e-06, 0.0, 1.3096723705530167e-09, 1.1496013030409813e-09, 0.0]

I'm trying to understand whether this is expected, a result of something I'm doing wrong, or a bug. For context, my code is essentially the following (pared down to make it easier to see the essentials):

dl_kwargs = {'fasta_file': '../dat/hg19.fa'}
predict_dl_kwargs = {'fasta_file': '../dat/hg19.fa', 'intervals_file': random_seqs_fpath}
random_seqs_fpath = "../dat/ChIPseq.A549.CTCF.100.random.narrowPeak.gz"
deepsea = kipoi.get_model("DeepSEA/variantEffects", source="kipoi")

# Predictions from MutationMap
dl_kwargs = {'fasta_file': '../dat/hg19.fa'}
mm = MutationMap(deepsea, deepsea.default_dataloader, dataloader_args=dl_kwargs)
mmp = mm.query_bed(random_seqs_fpath, scores=['ref', 'diff'])
mutation_map_predictions = [
    mmp.mutation_map[i]['seq']['ref']['A549_CTCF_None_720']['mutation_map']
    for i in range(len(mmp.mutation_map)
]
# I also do some post-processing to just get one of the non-zero values from the 4 x 1000 mutation map.

# Predictions from pipeline.predict
pipeline_predictions = deepsea_predict.pipeline.predict(predict_dl_kwargs, batch_size=100)

Note that both of these use the exact same bed file and therefore should be looking at the same sequences.

Am I missing some key reason why I should expect these two prediction arrays to differ dramatically? I am using the default rc merge settings for both (and that wouldn't account for the order-of-magnitude differences anyway). The two best ideas I have for why I might be getting such different results are:

Contrary to my understanding of the docs, MutationMap.query_bed is re-centering on each currently-being-tested variant.
ref doesn't mean what I think it does.

Avsecz commented 4 years ago

@krrome Do you have an idea why the predictions are different?

Avsecz commented 4 years ago

MutationMap.query_bed is re-centering on each currently-being-tested variant.

I think that could be the potential hunch. @krrome do you know if that's true? ref should mean prediction for the reference allele as expected.

an1lam commented 4 years ago

That was my suspicion, but the docs made it sound like they don't do that for bed files. I tried tracing through the code but frankly ran out of steam before I could figure it out.

Avsecz commented 4 years ago

@krrome wrote the code so he'll be better able to help here.

an1lam commented 4 years ago

Thanks! As an FYI: yesterday, I ended up working around this by using kipoi-interpret's Mutation class instead (which gives correct-looking results), so I'm happy to keep this thread open or close it. Up to you!

Avsecz commented 4 years ago

Glad to hear the Mutation class worked well. Let's still keep this thread open.

kipoi / kipoi-veff

Results differ unexpectedly between `pipeline.predict` and `query_bed` with `ref` score #38