malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 24 forks source link

FST plotting function error #567

Closed KellyLBennett closed 2 months ago

KellyLBennett commented 3 months ago

The FST function should give the standard error or Z score when called on the diagonal. However the first row is swapped with the first column. Please see the figures below. The first is the plot with just FST and the second is a plot which should show Z score on the diagonal.

Screenshot 2024-07-10 at 16 22 50 Screenshot 2024-07-10 at 16 24 29
alimanfoo commented 2 months ago

Hi @KellyLBennett, just having a look at this now, would you be able to post the function call you made to generate the plots above so I can try to replicate?

KellyLBennett commented 2 months ago

Hi @alimanfoo I believe this was when running the following

fst_df= ag3.pairwise_average_fst(region="3L:15,000,000-41,000,000", cohorts=wild_cohorts,min_cohort_size=10,site_mask="arab")
ag3.plot_pairwise_average_fst(fst_df,annotation ="Z score")
alimanfoo commented 2 months ago

Thanks @KellyLBennett. Can you provide the value of the wild_cohorts parameter?

KellyLBennett commented 2 months ago

sample_treatments_relabel (1).csv Sorry yes, here it is.

wild_cohorts = {
                    "Mwea_2007" : f"taxon == 'arabiensis' and location == 'Mwea' and year == 2007 and insecticide == 'Untested' and partner_sample_id not in {misplaced_samples_list}",
                    "Mwea_2014" : f"taxon == 'arabiensis' and location == 'Mwea' and year == 2014 and insecticide == 'Untested' and partner_sample_id not in {misplaced_samples_list}",
                    "Teso_2013" : f"taxon == 'arabiensis' and location == 'Teso' and insecticide == 'Untested' and partner_sample_id not in {misplaced_samples_list}",
                    "Turkana_2006" : f"taxon == 'arabiensis' and location == 'Turkana' and insecticide == 'Untested' and partner_sample_id not in {misplaced_samples_list}",
                    "Kilifi_2012" : "taxon == 'arabiensis' and location == 'Kilifi' and year == 2012"}

This is after adding on metadata, which has the insecticide experiment and bioassay outcome. I will attach the CSV.

The misplaced samples list is

misplaced_samples_list=['THD10007','THD10059','THD10064','THD10016','THD1010','TEAC10038','TEAC10066','TE0054','TEPM10192','TEMP10204','THD10058','THD10006','TEPM10204']

alimanfoo commented 2 months ago

Thanks @KellyLBennett, I can replicate with this slightly simplified example:

image