malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 24 forks source link

Function(s) to investigate variation in PCR primers #280

Open alimanfoo opened 2 years ago

alimanfoo commented 2 years ago

There's some really nice plotting functions in the new AgamPrimer package by @sanjaynagi for investigating whether there is any variation within a specific genome region that might affect primer binding.

Raising this issue to consider whether it would be a good idea to bring any of that functionality upstream to the malariagen_data package.

In particular, there is a possible use case where primers have previously been designed via another route, but then someone wants to investigate possible variation within MalariaGEN data.

sanjaynagi commented 2 years ago

I have made a draft implementation of this here - https://colab.research.google.com/drive/1dNPV3bvgrtx7d06RXTT0bIE90hlKanEo

Please check it out. Any thoughts and comments would be great.

The check_my_oligo() function requires gget to align sequences to the AgamP3 genome (latest available), so I don't know if this is something you want to include within malariagen_data or not. If not, I will certainly include it in AgamPrimer regardless :)

It's still the case where it's not so useful for cDNA primers for gene expression, if either the forward or reverse primer has been designed over an exon-exon boundary, we won't be able to locate that sequence, but there's nothing we can do about that.

As well as the check_my_oligo() function, in the above notebook, I have made a generic plot_sequence_frequencies() function, which is a streamlined version of the AgamPrimer frequency plots, which could be useful in malariagen_data?

sanjaynagi commented 2 years ago

Example of the function and the plot_sequence_frequencies() output, showing kdr-995s and kdr-995f frequencies in Ag3.

base_freq