widdowquinn / find_differential_primers

Code for design of diagnostic PCR primers, and metabarcoding markers.
https://widdowquinn.github.io/find_differential_primers/
MIT License
57 stars 25 forks source link

Replace `ePrimer3` with modern Primer3 (v2+) #30

Closed widdowquinn closed 5 years ago

widdowquinn commented 5 years ago

The ePrimer3 requirement needs a deprecated version of primer3 - we should move to a more modern primer prediction tool.

widdowquinn commented 5 years ago

I looked for an existing interface to the newer version of Primer3, and found: https://github.com/libnano/primer3-py - library-level bindings to the newer Primer3 version. I checked to see if it was in bioconda:

$ conda search primer3-py

It was, so I installed it to play around with.

The primer3-py documentation covers primer design at https://libnano.github.io/primer3-py/quickstart.html#primer-design - the general recommendation is to read the code in the tests directory for examples.

I got the primer design working with a script, but it looks like the length of time it takes to design primers will still require qsub/subprocess.run() calls for parallelisation to be effective in pdp:

$ time python ./test_primer3.py   # 1000 primers on a genome
5064019

real    3m14.293s
user    3m1.792s
sys 0m1.613s
$ time python ./test_primer3.py    # 5 primers on a genome
5064019

real    0m23.256s
user    0m21.801s
sys 0m0.855s

so the current plan is to implement the same parser/module structure as for ePrimer3, but using primer3 (v2+) for primer design instead.

widdowquinn commented 5 years ago

Another reason for using the subcommand.run() approach - from the Primer3 manual:

If you are a programmer, you will see that primer3 is now distributed 
under the GNU General Public License, version 2 or (at your option) 
any later version of the License (GPL2). As we understand it, if you 
include parts of the primer3 source code in your source code or link 
to primer3 binary libraries in your executable, you have to release 
your software also under GPL2. If you only call primer3 from your 
software and interpret its output, you can use any license you want 
for your software. If you modify primer3 and then release your 
modified software, you have to release your modifications in source 
code under GPL2 as well. 

We chose GPL2 because we wanted primer3 to evolve and for the
improvements to find their way back into the main distribution. If 
you are programming a new web interface which runs primer3, 
please include in the about page of the tool the sentence "<your 
software name> uses primer3 version ...". Please consider releasing 
your software under GPL2 as well, especially if you do not want to 
maintain it in the future. 

We may have to shift from MIT to GPL2 licence if we link to the libraries directly.

widdowquinn commented 5 years ago

With commit 3ef5201 in the diagnostic_primers branch, a new subcommand is added: pdp primer3. This uses a locally-installed instance of Primer3 (v2+) instead of the EMBOSS primersearch tool to design primers to the input genomes.

Designed primer output is written to .primer3, .bed and .json files, and the resulting .json configuration files can be used in the same way as the output of pdp eprimer3.

widdowquinn commented 5 years ago

The Travis-CI test matrix all passes as of c8c7cf4.

The test inputs needed to have the thermodynamic parameters passed/defined explicitly as part of the tests, due to Travis-CI's environment setups.