iobio / gene.iobio

Gene.iobio vue
MIT License
57 stars 11 forks source link

Streamline annotation / rework VEP #446

Open AlistairNWard opened 4 years ago

AlistairNWard commented 4 years ago

We haven't looked at the annotation pipeline going into gene for a long time and it would be worth revisiting. For example, we are using vt to normalize and decompose, but we can likely achieve the same results purely with bcftools now, so could limit the number of tools required by gene.

Needs to be looked at and tested first before deciding on a plan.

mvelinder commented 4 years ago

Definitely worth reconsidering. bcftools is much faster than vt for example. We could also think about new annotations such as CCR region score, spliceAI and others.

AlistairNWard commented 4 years ago

I was thinking primarily for maintenance, it would be better to have the minimum number of different tools. If we can speed up as well, that would be huge. I would say we should open separate issues if we want to add other annotations.

mvelinder commented 4 years ago

I was thinking primarily for maintenance, it would be better to have the minimum number of different tools. If we can speed up as well, that would be huge. I would say we should open separate issues if we want to add other annotations.

Agreed, they would warrant new issues if we build new annotations in.

anderspitman commented 4 years ago

I think this might also be a good opportunity to simplify some other things. The big one that sticks out in my mind is that we're currently using reference sequences that are split by chromosome, rather than one big indexed reference. This requires maintaining a mapping to where those references live on the filesystem. I think we can change it to just use the full indexed references.

stefinfection commented 3 years ago

Moving back to planning board b/c dev backend currently in flux, and need to wait until stabilized before modifying script. Will check in with @anderspitman in a couple weeks for an update

anderspitman commented 3 years ago

@stefinfection I didn't realize that this is what you were wanting to work on. This is exactly the types of changes I'm making leading up to 1.0, so it might just be a matter of coordinating who does what.

anderspitman commented 3 years ago

**gru 1.0

mvelinder commented 3 years ago

Given our Slack discussions and my testing, I'd propose for the complete rework and removing VEP entirely...

  1. add gnomAD freqs using slivar
  2. add consequence annotations using bcftools csq
  3. add ClinVar significances, REVEL scores, dbSNP IDs, any other custom annotations using vcfanno

Annotations required checklist:

Please let me know if I'm missing any annotations required by gene @tonydisera and @anderspitman !

I can build the workflow on the command line tools and we can build, deploy and test. I imagine it will be some iterative testing cycles

anderspitman commented 3 years ago

I like it. I think the key thing will be figuring out the handoff point between me and @tonydisera. Obviously I'll be doing all the backend stuff, and Tony will be doing the visualization. But if you can tell me what format you need the data in I can also write the JavaScript parsing code.

stefinfection commented 3 years ago

I'm also happy to toss my hat in the ring to help with this as well. annotateSomaticVariants will need to update if we 86 VEP, so I'm happy to do both if that makes sense

anderspitman commented 3 years ago

I think the most likely path forward is running VEP and bcftools side by side for quite a while, and slowly transitioning the frontend over before actually getting rid of VEP

AlistairNWard commented 3 years ago

@anderspitman Should we close this, keep open, or create a new issue based on where we currently sit with VEP?

anderspitman commented 3 years ago

That's @mvelinder's call.

mvelinder commented 3 years ago

I'll keep it in the back of my mind, but it's likely not feasible - as long as we are still supporting RefSeq transcripts. Closing for now.

anderspitman commented 3 months ago

@tonydisera @AlistairNWard any opposition to me re-opening this while I'm investigating (again) the possibility of replacing VEP? My current focus is on implementing a custom tool to provide the missing HGVS annotations. However, @mvelinder's very useful comment above (https://github.com/iobio/gene.iobio/issues/446#issuecomment-744541956) indicates that we might have a few more gaps as well.

Sounds like maybe RefSeq is the most critical?

AlistairNWard commented 3 months ago

No objections. We can get consequences using bcftools and feed it a refseq gff3 files, so I’m not sure what the problem is that he’s referring to

tonydisera commented 3 months ago

@anderspitman, I have no objections to continued investigation of annotating with bcftools rather than VEP.