jsevo / taxumap

MIT License
15 stars 5 forks source link

Differences between inputting filenames and dataframes #15

Open funnell opened 2 years ago

funnell commented 2 years ago

Hello,

I've found that I was able to get TaxUMAP to work by specifying filenames to the Taxumap function, but providing them as dataframes didn't seem to work.

Here is a screenshot of what portions of the relative abundance and taxonomy dataframes look like image

And here is the error I get when running transform_self:

Error in rule taxumap:
    jobid: 0
    output: analysis/taxumap/amadeus/output/amadeus_embedding.feather, analysis/taxumap/amadeus/output/amadeus_dominant_taxon.feather

RuleException:
KeyError in line 64 of /Users/funnellt/Projects/PICI_microbiome/workflow/rules/taxumap.smk:
'Methanobrevibacter_smithii'
  File "/Users/funnellt/Projects/PICI_microbiome/workflow/rules/taxumap.smk", line 64, in __rule_taxumap
  File "/Users/funnellt/Projects/phylo-umap/taxumap/taxumap_base.py", line 113, in transform_self
  File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 42, in tax_agg
  File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 55, in aggregate_at_taxlevel
  File "/Users/funnellt/Projects/phylo-umap/taxumap/tools.py", line 55, in <listcomp>

I'm running Taxumap like this:

        from taxumap.taxumap_base import Taxumap

        relab = pd.read_csv(input['relab'])
        tax = pd.read_csv(input['tax'])

        taxumap = Taxumap(taxonomy=tax, microbiota_data=relab)
        taxumap.transform_self(
            neigh=28,
            min_dist = 0
        )

However, it works fine if I just specify the filenames like this:

        taxumap = Taxumap(taxonomy=input['tax'], microbiota_data=input['relab'])
        taxumap.transform_self(
            neigh=28,
            min_dist = 0
        )
granthussey commented 2 years ago

Thanks for bringing this up, @funnell.

Something is going on here with the aggregate_at_taxlevel function, so there's a problem with how the package is handling your specific data labels.

Is there any way you can provide me with even an abbreviated sample of the data you're working with? If there is any sensitive data, you can just fill the numbers with random values - I just need to your taxonomy table and abundances table and how they're currently labeled.

jsevo commented 2 years ago

@funnell

Please provide Grant with a sample. Otherwise I might close this. Thanks!

funnell commented 2 years ago

Here you go! Please forgive the very late response. species_abundance.csv metaphlan_species_taxonomy.csv