Closed sanjaynagi closed 1 month ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
ahhh, it seems I forgot I was working on a fork, as such, the google credentials dont work. Whats the best approach in future? @ahernank @alimanfoo
Hi @sanjaynagi, a short-term solution, but you have permissions to write directly on the repo rather than using branches.
I've added CNV support, though currently the bar is copy number, but potentially it should be # extra copies. Here is how it currently looks (no CNVs here)
Thanks @sanjaynagi. Just to mention I updated this branch to bring in some unrelated test fixes from master.
Very exciting, I'll give this a proper look asap!
Attention: Patch coverage is 12.19512%
with 108 lines
in your changes missing coverage. Please review.
Project coverage is 95.82%. Comparing base (
dc89c9c
) to head (5048263
). Report is 9 commits behind head on master.:exclamation: Current head 5048263 differs from pull request most recent head ac84ac0
Please upload reports for the commit ac84ac0 to get more accurate results.
Files | Patch % | Lines |
---|---|---|
malariagen_data/anoph/dipclust.py | 11.86% | 104 Missing :warning: |
malariagen_data/anoph/snp_frq.py | 20.00% | 4 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This PR is almost ready, but im finding it really difficult to map copy numbers in plot_dendro_cnv_bar()
to specific colours, within px.imshow.
Each CNV plot obviously uses its own min and max to specify the color range, but finding it strangely difficult to dynamically update the thresholds depending on what copy numbers are present in that plot. has this been done before in malariagen_data? please help @alimanfoo I am tired
Also the plot should be marginally zoomed in, I dont know why its not (not the end of the world).
Hi @sanjaynagi, some investigation into the CNV plotting issue here: https://github.com/sanjaynagi/malariagen-data-python/pull/1
Some updates:
thanks to @alimanfoo for sorting the CNV colorscale problem.
Ive completely removed any usage of px.imshow() now and replaced with go.Heatmap() for heterozygosity and amino acid heatmaps as well as CNVs. This also simplifies the code further, as it seems that we now don't need to pass the x_range through various functions which we did previously.
I've added a check to see if there are actually any amino acids above the MAF threshold, if there are not, an amino acid heatmap will not be created.
Think we just need to maybe write some tests, and a final check from @alimanfoo, then we may be good to go! yeehaw!! Ill have a go at zooming in the x-axis aswell (currently its slightly zoomed out).
Hi @sanjaynagi, some comments and suggestions over in https://github.com/sanjaynagi/malariagen-data-python/pull/2
@alimanfoo I've had a go at all the comments except the removing colorbars earlier one, cos I tried to do that in the past and it took me a while and I couldn't :)
Have pushed a few suggested changes directly to this branch, hope that's OK. Still reviewing so may do a bit more.
Hi @sanjaynagi, I pushed a few more changes, includes:
Hi @sanjaynagi, I think this is basically good to merge. Because this is a PR from a fork the CI tests won't run, so I'm thinking to merge this PR then open up a follow on PR from a branch within the main repo to confirm tests all run. How does that sound? Anything else you'd like to do here before I merge?
Hey @alimanfoo . Sounds good to me. Nothing else to add currently.
Builds on #507 and partially addresses #361.
In this PR, I add an advanced diplotype clustering function which performs diplotype clustering on a given region, and plots alongside it sample heterozygosity, SNP genotypes at interesting SNP variants (e.g., amino acid change substitutions), and copy number variation data. This is particularly useful for investigating potential driver resistance mutations at IR loci.
A new function
plot_diplotype_clustering_advanced()
is added to theAg3
andAf1
APIs.