How to get the correct idx for mutation inference

pagnani / ArDCA.jl

Autoregressive networks for protein

MIT License

33 stars 8 forks source link

How to get the correct idx for mutation inference #26

Closed AthenACHY closed 7 months ago

AthenACHY commented 1 year ago

Hi,

I am trying to use the package to do mutational influence and I use the alignment file (.a2m) generated from EVcoupling as the input of arDCA. From what I understand arnet reordered the columns in the running process and I can use idxperm to locate the correct site of my mutations. I also found that arnet would drop some columns from the alignment too and is there a way to find out what columns were dropped and which sites from the original alignments are preserved in the idxperm so that I can map the mutations annotations onto the xori for mutation influence?

thanks in advance for your help!

pagnani commented 1 year ago

The columns dropped are those for which the reference sequence has a gap. I am not on a computer right now, but I seem to remember that the function ArDCA.dms_single_site returns the indices of the dropped residues.

I will inspect more closely tomorrow.

pagnani commented 1 year ago

Indeed, as explained here (see 3. Predicting mutational effects),

D,idxgap=dms_single_site(arnet,arvar,target_sequence)

returns D a q x L matrix containing the mutational effects, and idxgap the indices of gap in the reference sequence.

pagnani commented 7 months ago

closing