marcom / ViennaRNA.jl

Julia interface to ViennaRNA for RNA structure prediction and analysis
Other
21 stars 1 forks source link

Feature request: Alifold #7

Closed timoleistner closed 2 years ago

timoleistner commented 2 years ago

Hi again,

Are there any plans to implement a wrapper for alifold? As I'm not familiar with C, I dont really know where to start but I would be thankful if you could lead me to the right direction. My current try looks like this

function alifold(sequences::Vector{AbstractString}, structure::AbstractString)
    alifold_output = LibRNA.vrna_alifold(sequences, structure)
    return alifold_output
end

Is the following needed somehow to

cstr_structure = Ptr{Cchar}(LibRNA.vrna_alloc(length(structure) + 1))
cstr_sequences = Ptr{Ptr{Cchar}}(LibRNA.vrna_alloc(sizeof(Ptr{Cchar})))

I'm almost certain this is not correct though. Do I also need to write the wrap_alifold function present in alifold.c? Do I need a specific struct to return the alifold information? Would be nice to have this implemented and I would like to help if I can.

Greetings, Timo

marcom commented 2 years ago

On v0.6.2 the following should now work:

fc = FoldCompound(["GGGAAAACCC", "GGCA-AAGUC"])
mfe(fc)
partfn(fc)
bpp(fc)
marcom commented 2 years ago

Closing for now, please reopen if there are any issues.

timoleistner commented 2 years ago

I've tested it and it seems like this line also returns true (or !false), if all strings in the msa are of equal length. https://github.com/marcom/ViennaRNA.jl/blob/ade41726aa07fda820d58666cab64f04845c11a2/src/ViennaRNA.jl#L63 This is rather interesting, and the reason is because the reduce function chains each evaluation after another. ((10 == 10) = true == 10) = false I have two suggestions for this line, and I've tested both:

using BenchmarkTools
test = [randstring(10) for _ in 1:100]

With the following results:

julia> @btime length(Set(length.(test))) != 1
  1.406 μs (11 allocations: 3.65 KiB)
false

julia> @btime !all(length.(test) .== length(first(test)))
  983.154 ns (6 allocations: 208 bytes)
false

Can you think of another prettier solution (they both feel kind of clunky)?

PS: Thanks alot for implementing it (and so fast too!)

marcom commented 2 years ago

Ups, you are right, that looks like a bug. I'll add a test for this and a fix

marcom commented 2 years ago

Thanks for testing this and finding the bug!

I implemented the check this way so it can exit early:

https://github.com/marcom/ViennaRNA.jl/blob/6a8fee28ac38285b6e29c187b1ca780565e3709c/src/ViennaRNA.jl#L63-L64

timoleistner commented 2 years ago

Another feature request, which I did not want to open another issue for, is to have the FoldCompound option for the sliding window size. Am I guessing correct that it does a global structure prediction with this default value (-1)? The options is called VRNA_MODEL_DEFAULT_WINDOW_SIZE. For longer sequences I think it would be cool to have this option for secondary structure prediction. Btw. do you have any idea how an appropriate value for this option would then be chosen?

marcom commented 2 years ago

As a quick possible alternative for folding longer sequences i would recommend to try LinearFold.jl. Maybe that covers your use case.

At the moment the structure prediction is always global. I agree that windowed predictions are still missing from ViennaRNA.jl (as are a few other algorithms).

I have opened a new issue #9 to discuss this further.

As for recommended window sizes, i think that depends on your use case, i.e. what kinds of structures do you want to be able to see?