kipoi / models

Model zoo for genomics
http://kipoi.org
MIT License
162 stars 58 forks source link

APARENT - number of predictions seems off #342

Open robertzeibich opened 2 years ago

robertzeibich commented 2 years ago

Hi, I am currently running Kipoi APARENT.

The provided example VCF file contains 15,077 variants, but the output only has 416 predictions for delta_logit_distal_prop and delta_logit_proximal_prop. I was expecting 15,077 predictions. Do you know why I only get 15,077 predictions? Kipoi APARENT

Kipoi APARENT
Hoeze commented 2 years ago

Hi @robertzeibich, we currently only evaluate variants that are close to a polyA-site: https://github.com/kipoi/models/blob/9dea5659d095c49ff8039beda7be9bcfaa168f2a/APARENT/veff/dataloader.py#L84-L108

Some notes:

What would you like to use APARENT for?

robertzeibich commented 2 years ago

Thank you for getting back to me that quickly. I used the keep_metadata parameter and then concatenated the output with a pandas dataframe.

poly(A) scores

Can you inform me once the bug was fixed?

I want to integrate the poly(A) scores in my whole genome sequence analysis. Perhaps compare the scores against healthy controls (1000 Genomes project) and see if patients with epilepsy and individuals from the 1000 Genomes project cluster somehow. If you have another idea what I could do with the scores, I am all ears. The data I am currently analyzing is whole genome sequenced data from people with epilepsy.

Hoeze commented 2 years ago

Hi @robertzeibich, I did merge now the PR to fix the reverse-complement issue. Still, I'm not 100% sure if the implementation is perfectly fine, so please look at the predictions with a bit of caution for now :)