varfish-org / mehari

VEP-like tool for sequence ontology and HGVS annotation of VCF files
MIT License
16 stars 1 forks source link

Annotation plugin support #426

Open xiamaz opened 6 months ago

xiamaz commented 6 months ago

Is your feature request related to a problem? Please describe. Both VEP and open-cravat support plugins, which can extend annotation capabilities without requiring these to be directly integrated into the core software.

Describe the solution you'd like mehari should offer a plugin interface with at least the features given by VEP. In the best case these should be VEP compatible.

Describe alternatives you've considered Most software supports annotating custom tsv, but this might be too limited for most use-cases.

Additional context First we will need to investigate the approach taken by both VEP and open-cravat for plugin support. Potentially something like wasmer might help, as a wasm intermediate step is utilized by multiple rust projects to allow for easy plugin integration without putting strong constraints on either programmming language or environment,.

xiamaz commented 6 months ago

VEP Plugins

Supported language: perl

Approach

Plugins are run for each line of input, before anything is printed to the output file. In addition the variant allele and overlapping genomic features are provided in an object.

Plugins need to implement new, get_header_info and run. On calling run, the return value is the additional info to be added to the entry.

Implementation concerns

Directly supporting perl-based plugins, would require either integrating a perl ffi-interface into rust (complicated) or looking into perl-wasm compilation, which might work. (https://perlwasm.github.io/)

xiamaz commented 6 months ago

open-cravat plugins

Supported language: python

Approach

open-cravat supports modular annotators for a large number of annotation scores. Otherwise plugin functionality is pretty similar to vep.

Implementation concerns

pyo3 can be used. compiling to wasm is also not well supported.

xiamaz commented 6 months ago

Design

Look into https://perldoc.perl.org/perlembed and https://github.com/PyO3/pyo3. We might be able to get some very simple vep and open-cravat extensions running.

Afterwards we should compare performance of these against e.g. wasm based extensions and potentially offer that as the main plugin approach.

holtgrewe commented 6 months ago

I looked a bit and now wonder how many plugins we can get to run. E.g. the VEP plugins often rely on the "tva" argument which is a complex data structure. See NMD for a simple VEP plugin.

It might be easier to provide some infrastructure for tabox lookup and then implement some plugins and crowd source from then on (after publication).

Overall, our native interface could pass the current vcf record as JSON serialization plus, say transcript Infos as JSON (serde is really cool), and vcf header as JSON and return a changed record as JSON.

holtgrewe commented 6 months ago

What about the following.

We create a native plugin system based on extism. This allows writing plugins in wasm. We pass data through interfaces as JSON for simplicity. We can model interfaces inspired by VEP and cravat.

We implement some core plugins such as annotation based on annonars/dbsnp in Rust. We provide a reference implementation of the VEP plugin NMD in Rust and Python compiled to WASM. We then explore how we can make a wrapper in the wasm layer that allows to run the VEP plugin NMD and some basic cravat plugin in Python.

We will be able to create the native interface and the NMD demo in Python and Rust. The exploration can be time boxed to day one day and we can postpone. I don't know whether we will be able to expose all of VEPs data structure needed for the plugins.

The strategy above allows for implementing something that should work easy enough with 98% confidence and the wrapper layer can be postponed/terminated.

xiamaz commented 6 months ago

Sound good. This keeps the overhead to a minimum and allows us to create a clean plugin interface.

tedil commented 6 months ago

I have implemented a dummy plugin + calling the plugin from mehari in the plugin-system branch, just to get a feeling for extism.