ERROR - reading gene covariate file: duplicate covariate variable name #142

bschilder commented 1 year ago

1. Bug description

It seems some of my CTDs still have duplicate celltype names. Thought my standardisation pipeline accounted for this, but apparently not:

Console output

Screenshot 2023-04-14 at 12 25 59

Expected behaviour

Maybe add some handling of this scenario (make celltypes unique) wthin MAGMA.Celltyping

2. Reproducible example

Seems to be occurring in :


magma_dirs <- MAGMA.Celltyping::import_magma_files(ids = c("ieu-a-298"))
ctd <- MAGMA.Celltyping::get_ctd("ctd_Jiang2021")
res <- MAGMA.Celltyping::celltype_associations_pipeline(
    ctd = ctd,
    ctd_levels = 1,
    ctd_name = "ctd_Jiang2021", 
    magma_dirs = magma_dirs)

3. Session info

Actually, seems to be occurring even in CTDs that worked fine before, e.g TabulaMuris_zebrafishGenes

Manually inspecting the CTD, this doesn't actually seem to be true. All celltypes are indeed unique.

For Jiang2021:

  CTD_std <-  EWCE::standardise_ctd(ctd = CTD,  
                                               input_species = species, 
                                               output_species = "human",
                                               sctSpecies_origin = species_dict[[x]],
                                               dataset = x, 
                                               force_standardise = TRUE,
                                               keep_plots = FALSE)

Screenshot 2023-04-14 at 12 38 49

So either MAGMA.Celltyping is picking up some old files where this was once true, or there is a bug in the pipeline. Another possibility is that the CTD gets screwed up when passing through the extra standardisation procedure here. But in theory, the CTD should just be passed right back if it detects that it's already been standardised before.

Seems to be working now after reprocessing CTDs