pagnani / ArDCA.jl

Autoregressive networks for protein
MIT License
31 stars 8 forks source link

`ardca` silently modifies input matrix #27

Closed PierreBarrat closed 1 year ago

PierreBarrat commented 1 year ago
julia> X = [1 2; 4 4; 5 6] # two sequences of length 3
3×2 Matrix{Int64}:
 1  2
 4  4
 5  6

julia> w = [1/2, 1/2];

julia> out = ardca(X, w);
site = 1    pl = 0.8326 time = 0.0003   status = SUCCESS
site = 2    pl = 0.2272 time = 0.0013   status = FTOL_REACHED

julia> X # sequence sites have been shuffled! 
3×2 Matrix{Int64}:
 4  4
 1  2
 5  6

julia> sample(out[1], 2) # a sample taken from the inferred model has the right order
3×2 Matrix{Int64}:
 1  2
 4  4
 5  6

I just realized this, after it caused a bug in some code I was running. I imagine it is due to re-ordering the sites using the entropic order. Possible fixes:

I can make a PR if you think this is useful.

pagnani commented 1 year ago

The problem is that the logic is that for ordering other than the natural I permute the idx of the alignment. The safe thing should be to me make a copy of the alignment in the constructor of arnet.

I am in Parma for a conference, but PR are welcome.

On Tue, Jun 20, 2023, 3:16 PM Pierre Barrat @.***> wrote:

julia> X = [1 2; 4 4; 5 6] # two sequences of length 3 3×2 Matrix{Int64}: 1 2 4 4 5 6

julia> w = [1/2, 1/2];

julia> out = ardca(X, w); site = 1 pl = 0.8326 time = 0.0003 status = SUCCESS site = 2 pl = 0.2272 time = 0.0013 status = FTOL_REACHED

julia> X # sequence sites have been shuffled! 3×2 Matrix{Int64}: 4 4 1 2 5 6

julia> sample(out[1], 2) # a sample taken from the inferred model has the right order 3×2 Matrix{Int64}: 1 2 4 4 5 6

I just realized this, after it caused a bug in some code I was running. I imagine it is due to re-ordering the sites using the entropic order. Possible fixes:

  • rename the function to ardca! so that users know what to expect;
  • work on a copy the input matrix Z; I believe it should not be any issue even for large alignments.

I can make a PR if you think this is useful.

— Reply to this email directly, view it on GitHub https://github.com/pagnani/ArDCA.jl/issues/27, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJYTHUZYP2FDG2Q5MWLQRLXMGPCRANCNFSM6AAAAAAZNIZ6HU . You are receiving this because you are subscribed to this thread.Message ID: @.***>