Convert sampled sequences back to alignment file

benjamin-lieser commented 2 years ago

I have read the tutorial and have successfully computed the sample matrix from the sample method. Now I want this sequences in a fasta file (or similar). Is there an easy way to do this here, or at least some documentation on the coding from amino acids to the numbers in this matrix?

pagnani commented 2 years ago

Hi @Unlikus

Indeed we did not provide this utility. However, you can define this function

function my_write_fasta(filedest::String, Z)
           num2let = ['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I','K', 'L','M', 'N','P','Q', 'R', 'S','T', 'V', 'W', 'Y','-']
           N,M = size(Z)
           open(filedest,"w") do fp
               for s in 1:M
                   println(fp,"> Seq $s")
                   for a in 1:N
                       print(fp,num2let[Int(Z[a,s])])
                   end
               println(fp)
             end
           end
end

where:

filedest is the name of the file where you want to write the results
Z is a matrix containing the generated alignment (a matrix of size N x M where N is the MSA length and M is the number of samples generated).

So, as an example, you could run the following pipeline:

julia> arnet,arvar=ardca("data/PF14/PF00014_mgap6.fasta.gz");
julia> Zgen=sample(arnet,10000)
julia> my/write_fasta("foo.fasta", Zgen)

Last thing, if you want to define the function it is enough to copy and paste the definition above, either on the REPL or in a jupyter cell.

Let me know if you have any other problem

benjamin-lieser commented 2 years ago

Thanks a lot :)

pagnani / ArDCA.jl

Convert sampled sequences back to alignment file #19