Open AlistairNWard opened 4 years ago
Definitely worth reconsidering. bcftools is much faster than vt for example. We could also think about new annotations such as CCR region score, spliceAI and others.
I was thinking primarily for maintenance, it would be better to have the minimum number of different tools. If we can speed up as well, that would be huge. I would say we should open separate issues if we want to add other annotations.
I was thinking primarily for maintenance, it would be better to have the minimum number of different tools. If we can speed up as well, that would be huge. I would say we should open separate issues if we want to add other annotations.
Agreed, they would warrant new issues if we build new annotations in.
I think this might also be a good opportunity to simplify some other things. The big one that sticks out in my mind is that we're currently using reference sequences that are split by chromosome, rather than one big indexed reference. This requires maintaining a mapping to where those references live on the filesystem. I think we can change it to just use the full indexed references.
Moving back to planning board b/c dev backend currently in flux, and need to wait until stabilized before modifying script. Will check in with @anderspitman in a couple weeks for an update
@stefinfection I didn't realize that this is what you were wanting to work on. This is exactly the types of changes I'm making leading up to 1.0, so it might just be a matter of coordinating who does what.
**gru 1.0
Given our Slack discussions and my testing, I'd propose for the complete rework and removing VEP entirely...
Annotations required checklist:
Please let me know if I'm missing any annotations required by gene @tonydisera and @anderspitman !
I can build the workflow on the command line tools and we can build, deploy and test. I imagine it will be some iterative testing cycles
I like it. I think the key thing will be figuring out the handoff point between me and @tonydisera. Obviously I'll be doing all the backend stuff, and Tony will be doing the visualization. But if you can tell me what format you need the data in I can also write the JavaScript parsing code.
I'm also happy to toss my hat in the ring to help with this as well. annotateSomaticVariants will need to update if we 86 VEP, so I'm happy to do both if that makes sense
I think the most likely path forward is running VEP and bcftools side by side for quite a while, and slowly transitioning the frontend over before actually getting rid of VEP
@anderspitman Should we close this, keep open, or create a new issue based on where we currently sit with VEP?
That's @mvelinder's call.
I'll keep it in the back of my mind, but it's likely not feasible - as long as we are still supporting RefSeq transcripts. Closing for now.
@tonydisera @AlistairNWard any opposition to me re-opening this while I'm investigating (again) the possibility of replacing VEP? My current focus is on implementing a custom tool to provide the missing HGVS annotations. However, @mvelinder's very useful comment above (https://github.com/iobio/gene.iobio/issues/446#issuecomment-744541956) indicates that we might have a few more gaps as well.
Sounds like maybe RefSeq is the most critical?
No objections. We can get consequences using bcftools and feed it a refseq gff3 files, so I’m not sure what the problem is that he’s referring to
@anderspitman, I have no objections to continued investigation of annotating with bcftools rather than VEP.
We haven't looked at the annotation pipeline going into gene for a long time and it would be worth revisiting. For example, we are using vt to normalize and decompose, but we can likely achieve the same results purely with bcftools now, so could limit the number of tools required by gene.
Needs to be looked at and tested first before deciding on a plan.