mwdavis2 / ApE

ApE- A plasmid Editor
Other
8 stars 1 forks source link

Make consensus sequence of alignment exportable as sequence #1

Open MarkusPiotrowski opened 5 months ago

MarkusPiotrowski commented 5 months ago

Dear Wayne,

Every now and then I'm looking for freely available molecular biology software, e.g. to give it to our beginner students or use it by myself. For my own little problems I'm actually using a very old version of DNAMAN, a commercial package, however, quite cheap and with a lot of functionality, SerialCloner and ApE are two programs that I regularly check (although SerialCloner doesn't seem to be in active development for some years now). ApE is a little weak on the protein side and very strong in virtual cloning (which I don't need). But one feature that I'm really missing is assembly (to assemble overlapping sequences, preferably more than two). While in most cases I can (mis)use the alignment function of ApE to check if two sequences overlap, it would be nice if it would be possible also to export the alignment as consensus sequence. As far as I can see, this is not possible, is it?

mwdavis2 commented 5 months ago

Thanks for dropping a note. You are correct- it's designed more for cloning than for protein work. As such, the alignment function is centered on verifying that an expected sequence matches the sequencing reads from a clone. Thus, there is no function to assemble a consensus from the reads, as the reference sequence is assumed to be the correct assembly. De-novo assembly is a different problem, with different math. As you point out, the math is similar enough that you can misuse one for the other, but if you have more than a couple of simply aligning long reads it gets more and more likely to go in the wrong direction.

What protein functions would you like to see? One obvious one is alignment. Like de-novo assembly, multiple alignment and phylogenetic analysis is a very different problem with complex math that is solved by several other packages- Tcoffee, Muscle, Clustal, MAFFT and others. I've got an unreleased project to act as a front-end to these algorithms and just format the display of the alignments for better presentation purposes.

On Thu, Mar 14, 2024 at 9:16 AM Markus Piotrowski @.***> wrote:

Dear Wayne,

Every now and then I'm looking for freely available molecular biology software, e.g. to give it to our beginner students or use it by myself. For my own little problems I'm actually using a very old version of DNAMAN, a commercial package, however, quite cheap and with a lot of functionality, SerialCloner and ApE are two programs that I regularly check (although SerialCloner doesn't seem to be in active development for some years now). ApE is a little weak on the protein side and very strong in virtual cloning (which I don't need). But one feature that I'm really missing is assembly (to assemble overlapping sequences, preferably more than two). While in most cases I can (mis)use the alignment function of ApE to check if two sequences overlap, it would be nice if it would be possible also to export the alignment as consensus sequence. As far as I can see, this is not possible, is it?

— Reply to this email directly, view it on GitHub https://github.com/mwdavis2/ApE/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKEEE2Y43CGZ35OFU66XYTYYG5NJAVCNFSM6AAAAABEWJ4HUOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DMNRTGYZTGOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Wayne Davis

School of Biological Sciences HHMI, University of Utah 257 South 1400 East Salt Lake City, UT 84112-0840 (801) 585-3692

MarkusPiotrowski commented 5 months ago

As such, the alignment function is centered on verifying that an expected sequence matches the sequencing reads from a clone.

This is also mostly my application, but I often have forward and reverse Sanger reads that overlap, and I want to combine them into one sequence. Thus it would be nice if I could just export the consensus of the ApE alignment. With maybe an option how to deal with ambiguities.

What protein functions would you like to see?

Yes, alignment of course. And maybe some more stats, as AA composition, pI. Would also be nice to have a protein sequence in a sequence window with similar functionality, e.g. show MW of selected sequence.

BTW, I'm a molecular biologist and hobby programmer like you. I'm playing around with Python for some years now (also contributing to the Biopython project) and one or two times I started a project that should become something similar to ApE, but unfortunately I don't have the staying power that you have. Have you ever considered a rewrite of ApE in another language like Java or Python?

mwdavis2 commented 5 months ago

On Thu, Mar 14, 2024 at 10:00 AM Markus Piotrowski @.***> wrote:

As such, the alignment function is centered on verifying that an expected sequence matches the sequencing reads from a clone.

This is also mostly my application, but I often have forward and reverse Sanger reads that overlap, and I want to combine them into one sequence. Thus it would be nice if I could just export the consensus of the ApE alignment. With maybe an option how to deal with ambiguities.

You can do this with copy paste, basically. If you make all of the reads into DNA sequence, then align them. You can double-click on a base in the alignment to set the selection point to that base. If you do that for the two overlapping sequences, you can just copy the downstream sequence into the tail of the upstream sequence.

What protein functions would you like to see? Yes, alignment of course. And maybe some more stats, as AA composition, pI. Would also be nice to have a protein sequence in a sequence window with similar functionality, e.g. show MW of selected sequence.

Alignment is definitely another whole program. Other than MW, pI (I don't know who runs isoelectric gels anymore?), and aa composition, what other functions from the DNA sequence would apply to protein? There are the old hydrophobicity and alpha/beta propensity measures, but I doubt that those are of any interest these days with more modern methods. None of: restriction digest or gels, translation and open reading frames, in silico cloning, melting temperature, PCR would apply.

BTW, I'm a molecular biologist and hobby programmer like you. I'm playing around with Python for some years now (also contributing to the Biopython project) and one or two times I started a project that should become something similar to ApE, but unfortunately I don't have the staying power that you have. Have you ever considered a rewrite of ApE in another language like Java or Python?

I don't see the advantage in moving to python. It would still just use Tcl/Tk driving the UI via Tkinter, and so I'd just be adding another layer on top with no new functionality.

— Reply to this email directly, view it on GitHub https://github.com/mwdavis2/ApE/issues/1#issuecomment-1997795094, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKEEE5VBBP5OOOSLN4XXVDYYHCSXAVCNFSM6AAAAABEWJ4HUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXG44TKMBZGQ . You are receiving this because you commented.Message ID: @.***>

-- Wayne Davis

School of Biological Sciences HHMI, University of Utah 257 South 1400 East Salt Lake City, UT 84112-0840 (801) 585-3692

MarkusPiotrowski commented 5 months ago

You can do this with copy paste, basically. If you make all of the reads into DNA sequence, then align them. You can double-click on a base in the alignment to set the selection point to that base. If you do that for the two overlapping sequences, you can just copy the downstream sequence into the tail of the upstream sequence.

Know your tool! Thank you, this works nicely.