mbhall88 / drprg

Drug Resistance Prediction with Reference Graphs
https://mbh.sh/drprg/
MIT License
19 stars 1 forks source link

Add grammar for specifying variant "expert" rules #14

Closed mbhall88 closed 1 year ago

mbhall88 commented 1 year ago

Instead of listing all indels in katG in the panel for instance, allow passing "expert" rules into the index build step. This will also help clean up the VCFs and make the inference of variant consequences much easier.

I have been thinking a lot about the best way to implement/handle this. Do we require the rules in the panel, or do we require them in a separate file? My thoughts are a separate file would be best to avoid confusing the panel format with the expert rule format.

The other question is around nomenclature of the expert rules. Phil Fowler created a nomenclature (GARC), but it clashes with existing conventions such as * meaning stop codon (Phil uses !). Additionally, HGVS does not support these types of rules. I don't think it needs to be too fancy, so I think we could use something straightforward like

type,gene,start,end,drugs
missense,rpoB,426,452,Rifampicin
nonsense,rpoB,426,452,Rifampicin
frameshift,rpoB,426,452,Rifampicin
nonsense,katG,,,Isoniazid
frameshift,katG,,,Isoniazid
nonsense,ethA,,,Ethionamide
frameshift,ethA,,,Ethionamide
nonsense,gid,,,Streptomycin
frameshift,gid,,,Streptomycin
nonsense,pncA,,,Pyrazinamide
frameshift,pncA,,,Pyrazinamide

where if start and end are empty, the entire gene is inferred.