rownames not matching in tse and assay when using rownames = NCBI #298

Closed sdgamboa closed 12 months ago

sdgamboa commented 1 year ago

It seems that rownames in the TSE and the assay don't match when using the rownames = 'NCBI' option in curatedMetagenomicData? I think this prevents the use of tidySummarizedExperiment to automatically convert to tibble: https://github.com/stemangiola/tidySummarizedExperiment/issues/70

dataset_name <- "HallAB_2017.relative_abundance"
tse <- curatedMetagenomicData(
    pattern = dataset_name, 
    dryrun = FALSE, rownames = 'NCBI',
    counts = TRUE
#> $`2021-10-14.HallAB_2017.relative_abundance`
#> dropping rows without rowTree matches:
#>   k__Bacteria|p__Actinobacteria|c__Coriobacteriia|o__Coriobacteriales|f__Atopobiaceae|g__Olsenella|s__Olsenella_profusa
#>   k__Bacteria|p__Actinobacteria|c__Coriobacteriia|o__Coriobacteriales|f__Coriobacteriaceae|g__Collinsella|s__Collinsella_stercoris
#>   k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Carnobacteriaceae|g__Granulicatella|s__Granulicatella_elegans
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcus|s__Ruminococcus_champanellensis
#>   k__Bacteria|p__Firmicutes|c__Erysipelotrichia|o__Erysipelotrichales|f__Erysipelotrichaceae|g__Bulleidia|s__Bulleidia_extructa
#>   k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Sutterellaceae|g__Sutterella|s__Sutterella_parvirubra
#>   k__Bacteria|p__Synergistetes|c__Synergistia|o__Synergistales|f__Synergistaceae|g__Cloacibacillus|s__Cloacibacillus_evryensis
#> class: TreeSummarizedExperiment 
#> dim: 503 259 
#> metadata(1): agglomerated_by_rank
#> assays(1): relative_abundance
#> rownames(503): 853 820 ... 172901 1262744
#> rowData names(7): superkingdom phylum ... genus species
#> colnames(259): p8582_mo1 p8582_mo10 ... SKST041_2_G103027
#>   SKST041_3_G103028
#> colData names(24): study_name subject_id ... HBI SCCAI
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (503 rows)
#> rowTree: 1 phylo tree(s) (10430 leaves)
#> colLinks: NULL
#> colTree: NULL
#> [1] "TreeSummarizedExperiment"
#> attr(,"package")
#> [1] "TreeSummarizedExperiment"
#> [1] "853"    "820"    "301301" "28117"  "357276" "39491"
head(rownames(assay(tse, "relative_abundance", withDimnames = FALSE)))
#> [1] "1239_186801_186802_216572_216851_853" 
#> [2] "976_200643_171549_815_816_820"        
#> [3] "1239_186801_186802_186803_841_301301" 
#> [4] "976_200643_171549_171550_239759_28117"
#> [5] "976_200643_171549_815_909656_357276"  
#> [6] "1239_186801_186802_186803_NA_39491"
tidy_tse <- tidySummarizedExperiment::as_tibble(tse)
#> Error in `map2()`:
#> ℹ In index: 1.
#> ℹ With name: relative_abundance.
#> Caused by error in `.x[rownames(se), , drop = FALSE]`:
#> ! subscript out of bounds
#> Backtrace:
#>      ▆
#>   1. ├─tidySummarizedExperiment::as_tibble(tse)
#>   2. ├─tidySummarizedExperiment:::as_tibble.SummarizedExperiment(tse)
#>   3. │ └─tidySummarizedExperiment:::.as_tibble_optimised(...)
#>   4. │   └─tidySummarizedExperiment:::get_count_datasets(x)
#>   5. │     ├─... %>% ...
#>   6. │     └─purrr::map2(...)
#>   7. │       └─purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
#>   8. │         ├─purrr:::with_indexed_errors(...)
#>   9. │         │ └─base::withCallingHandlers(...)
#>  10. │         ├─purrr:::call_with_cleanup(...)
#>  11. │         └─tidySummarizedExperiment (local) .f(.x[[i]], .y[[i]], ...)
#>  12. ├─purrr::when(...)
#>  13. ├─purrr::when(...)
#>  14. └─purrr (local) `<fn>`(`<sbscOOBE>`)
#>  15.   └─cli::cli_abort(...)
#>  16.     └─rlang::abort(...)
Created on 2023-08-15 with reprex v2.0.2

sdgamboa commented 1 year ago

Related issue: https://github.com/stemangiola/tidySummarizedExperiment/pull/78

schifferl commented 12 months ago

I am not sure I even understand the problem @sdgamboa, but here is a quick solution.


tse <- 
        pattern = "HallAB_2017.relative_abundance", 
        dryrun = FALSE,
        counts = TRUE,
        rownames = "NCBI"

rownames(assay(tse, withDimnames = FALSE)) <-
    rownames(assay(tse, withDimnames = TRUE))

tidy_tse <-

If you can provide some additional explanation or point me to the error in the programing, I'd be happy to fix it.

schifferl commented 12 months ago

Nevermind, I understand – this is resolved in 086d953