neurogenomics / MAGMA_Celltyping

Find causal cell-types underlying complex trait genetics
https://neurogenomics.github.io/MAGMA_Celltyping
71 stars 31 forks source link

ERROR - reading gene covariate file: duplicate variable name 'Fetal_Neuron' #156

Closed bschilder closed 1 week ago

bschilder commented 1 week ago

Checklist

Affected version

2.0.13 I'm guessing @Al-Murphy is using the latest version.

Steps to reproduce the bug

HCL <- MSTExplorer::load_example_ctd(c("ctd_HumanCellLandscape.rds"),multi_dataset=FALSE) 

path_formatted <- MAGMA.Celltyping::get_example_gwas(  trait = "prospective_memory")

genesOutPath <- MAGMA.Celltyping::map_snps_to_genes(  path_formatted = path_formatted,  force_new = TRUE,  genome_build = "GRCh37")

MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(  magma_dirs = dirname(genesOutPath),  
                                                              ctd = HCL,  
                                                              ctd_species = "human",   
                                                              ctd_name = "Test",   
                                                              run_linear = TRUE,  
                                                              run_top10 = TRUE,  
                                                              force_new = TRUE)

Actual behavior


> HCL <- MSTExplorer::load_example_ctd(c("ctd_HumanCellLandscape.rds"),
+                                      multi_dataset=FALSE) 
Loading ctd_HumanCellLandscape.rds
> path_formatted <- MAGMA.Celltyping::get_example_gwas(
+   trait = "prospective_memory")
Importing munged GWAS summary statistics: prospective_memory
ℹ All local files already up-to-date!
Saving decompressed copy of path_formatted ==>  /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/prospective_memory.ukb.tsv
> genesOutPath <- MAGMA.Celltyping::map_snps_to_genes(
+   path_formatted = path_formatted,
+   force_new = TRUE,
+   genome_build = "GRCh37")
Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Using existing genome_ref found in storage_dir.
ℹ All local files already up-to-date!

==== MAGMA Step 1: Generate genes.annot file ====

Welcome to MAGMA v1.10 (custom)
Using flags:
    --annotate window=35,10
    --snp-loc /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/prospective_memory.ukb.tsv
    --gene-loc /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/NCBI37.3.gene.loc
    --out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN

Start time is 16:32:28, Thursday 03 Oct 2024

Starting annotation...
Reading gene locations from file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/NCBI37.3.gene.loc... 
    adding window: 35000bp (before), 10000bp (after)
    19161 gene locations read from file
    chromosome  1: 2016 genes
    chromosome  2: 1226 genes
    chromosome  3: 1050 genes
    chromosome  4: 745 genes
    chromosome  5: 856 genes
    chromosome  6: 750 genes
    chromosome  7: 906 genes
    chromosome  8: 669 genes
    chromosome  9: 775 genes
    chromosome 10: 723 genes
    chromosome 11: 1275 genes
    chromosome 12: 1009 genes
    chromosome 13: 320 genes
    chromosome 14: 595 genes
    chromosome 15: 586 genes
    chromosome 16: 817 genes
    chromosome 17: 1147 genes
    chromosome 18: 271 genes
    chromosome 19: 1389 genes
    chromosome 20: 527 genes
    chromosome 21: 215 genes
    chromosome 22: 442 genes
    chromosome  X: 805 genes
    chromosome  Y: 47 genes
Reading SNP locations from file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/prospective_memory.ukb.tsv... 
    WARNING: on line 1, chromosome code 'CHR' not recognised; skipping SNP (ID = SNP)
    398092 SNP locations read from file                                                             
    of those, 215415 (54.11%) mapped to at least one gene
Writing annotation to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.annot
    for chromosome  1, 744 genes are empty (out of 2016)
    for chromosome  2, 425 genes are empty (out of 1226)
    for chromosome  3, 319 genes are empty (out of 1050)
    for chromosome  4, 275 genes are empty (out of 745)
    for chromosome  5, 252 genes are empty (out of 856)
    for chromosome  6, 234 genes are empty (out of 750)
    for chromosome  7, 275 genes are empty (out of 906)
    for chromosome  8, 272 genes are empty (out of 669)
    for chromosome  9, 238 genes are empty (out of 775)
    for chromosome 10, 239 genes are empty (out of 723)
    for chromosome 11, 394 genes are empty (out of 1275)
    for chromosome 12, 313 genes are empty (out of 1009)
    for chromosome 13, 121 genes are empty (out of 320)
    for chromosome 14, 211 genes are empty (out of 595)
    for chromosome 15, 220 genes are empty (out of 586)
    for chromosome 16, 307 genes are empty (out of 817)
    for chromosome 17, 305 genes are empty (out of 1147)
    for chromosome 18, 84 genes are empty (out of 271)
    for chromosome 19, 393 genes are empty (out of 1389)
    for chromosome 20, 151 genes are empty (out of 527)
    for chromosome 21, 65 genes are empty (out of 215)
    for chromosome 22, 150 genes are empty (out of 442)
    for chromosome  X, 805 genes are empty (out of 805)
    for chromosome  Y, 47 genes are empty (out of 47)
    at least one SNP mapped to each of a total of 12322 genes (out of 19161)

End time is 16:32:29, Thursday 03 Oct 2024 (elapsed: 00:00:01)

==== MAGMA Step 2: Generate genes.out ====

Welcome to MAGMA v1.10 (custom)
Using flags:
    --bfile /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur synonym-dup=skip
    --pval /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/prospective_memory.ukb.tsv ncol=N duplicate=drop
    --gene-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.annot
    --out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN

Start time is 16:32:29, Thursday 03 Oct 2024

Loading PLINK-format data...
Reading file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur.fam... 503 individuals read
Reading file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur.bim... 22665064 SNPs read
Preparing file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur.bed... 

Reading SNP synonyms from file /Users/alanmurphy/Library/Caches/org.R-project.R/R/MAGMA.Celltyping/g1000_eur/g1000_eur.synonyms (auto-detected)
    read 6016767 mapped synonyms from file, mapping to 3921040 SNPs in the data
    WARNING: detected 133 synonymous SNP pairs in the data
             skipped all synonym entries involved, synonymous SNPs are kept in analysis
             writing list of detected synonyms in data to supplementary log file
Reading SNP p-values from file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/prospective_memory.ukb.tsv... 
    detected 14 variables in file
    using variable: SNP (SNP id)
    using variable: P (p-value)
    using variable: N (sample size; discarding SNPs with N < 50)
    read 398093 lines from file, containing valid SNP p-values for 387654 SNPs in data (97.38% of lines, 1.71% of SNPs in data)
    WARNING: file contained 149 SNPs (same IDs or synonyms) with duplications
             dropped all occurrences of each from analysis
             writing list of duplicated IDs to supplementary log file
Loading gene annotation from file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.annot... 
    12322 gene definitions read from file
    found 12190 genes containing valid SNPs in genotype data

Starting gene analysis... 
    using model: SNPwise-mean
    writing gene analysis results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.out
    writing intermediate output to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw

End time is 16:34:33, Thursday 03 Oct 2024 (elapsed: 00:02:04)
> MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(
+   magma_dirs = dirname(genesOutPath),
+   ctd = HCL,
+   ctd_species = "human", 
+   ctd_name = "Test", 
+   run_linear = TRUE, 
+   run_top10 = TRUE,
+   force_new = TRUE)
Preparing CellTypeDataset.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Checking CTD: level 1
Checking CTD: level 2
Checking CTD: level 3
Checking CTD: level 4
Checking CTD: level 5
Checking CTD: level 1
Checking CTD: level 2
Checking CTD: level 3
Checking CTD: level 4
Checking CTD: level 5
prospective_memory.ukb.tsv.35UP.10DOWN
======= Calculating celltype associations: linear mode =======
Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Running MAGMA: Linear mode
Mapping gene symbols in specificity_quantiles matrix to entrez IDs.
Welcome to MAGMA v1.10 (custom)
Using flags:
    --gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
    --gene-covar /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f5d22076d
    --model direction=pos
    --out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level1.Test_linear.Linear

Start time is 16:30:43, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
    12190 genes read from file
Loading gene-level covariates...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f5d22076d... 
    detected 59 variables in file (using all)
    found 59 valid gene covariates, for 10651 genes defined in genotype data
Processing missing values...
    found 1539 genes not present in all input files: removing these from analysis
    10651 genes remaining in analysis
Preparing variables for analysis...
    truncating Z-scores 3 points below zero or 6 standard deviations above the mean
    truncating covariate values more than 5 standard deviations from the mean
    total variables available for analysis: 59 gene covariates

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
    testing direction: one-sided, positive (sets), one-sided, positive (covar)
    conditioning on internal variables:
        gene size, log(gene size)
        gene density, log(gene density)
        inverse mac, log(inverse mac)
    analysing individual variables

    analysing single-variable models (number of models: 59)
    writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level1.Test_linear.Linear.gsa.out

End time is 16:30:45, Thursday 03 Oct 2024 (elapsed: 00:00:02)
Reading enrichment results file into R.
Running MAGMA: Linear mode
Mapping gene symbols in specificity_quantiles matrix to entrez IDs.
Welcome to MAGMA v1.10 (custom)
Using flags:
    --gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
    --gene-covar /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f6515b4df
    --model direction=pos
    --out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level2.Test_linear.Linear

Start time is 16:30:46, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
    12190 genes read from file
Loading gene-level covariates...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f6515b4df... 

ERROR - reading gene covariate file: duplicate variable name 'Fetal_Neuron'

Terminating program.
Reading enrichment results file into R.
Error in file(file, "rt"): cannot open the connection

======= Calculating celltype associations: top10% mode =======
Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 60 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
    --gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
    --set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f7415d051
    --out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level1.Test_top10.Top10pct

Start time is 16:30:46, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
    12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f7415d051... 
    59 gene-set definitions read from file
    found 59 gene sets containing genes defined in genotype data (containing a total of 8583 unique genes)
Preparing variables for analysis...
    truncating Z-scores 3 points below zero or 6 standard deviations above the mean
    truncating covariate values more than 5 standard deviations from the mean
    total variables available for analysis: 59 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
    testing direction: one-sided, positive (sets), two-sided (covar)
    conditioning on internal variables:
        gene size, log(gene size)
        gene density, log(gene density)
        inverse mac, log(inverse mac)
    analysing individual variables

    analysing single-variable models (number of models: 59)
    writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level1.Test_top10.Top10pct.gsa.out

End time is 16:30:49, Thursday 03 Oct 2024 (elapsed: 00:00:03)
Reading enrichment results file into R.
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 64 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
    --gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
    --set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f441fb161
    --out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level2.Test_top10.Top10pct

Start time is 16:30:50, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
    12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f441fb161... 
    63 gene-set definitions read from file
    found 63 gene sets containing genes defined in genotype data (containing a total of 10359 unique genes)
Preparing variables for analysis...
    truncating Z-scores 3 points below zero or 6 standard deviations above the mean
    truncating covariate values more than 5 standard deviations from the mean
    total variables available for analysis: 63 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
    testing direction: one-sided, positive (sets), two-sided (covar)
    conditioning on internal variables:
        gene size, log(gene size)
        gene density, log(gene density)
        inverse mac, log(inverse mac)
    analysing individual variables

    analysing single-variable models (number of models: 63)
    writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level2.Test_top10.Top10pct.gsa.out

End time is 16:30:53, Thursday 03 Oct 2024 (elapsed: 00:00:03)
Reading enrichment results file into R.
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 125 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
    --gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
    --set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f7cb67a37
    --out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level3.Test_top10.Top10pct

Start time is 16:30:53, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
    12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f7cb67a37... 
    124 gene-set definitions read from file
    found 124 gene sets containing genes defined in genotype data (containing a total of 10158 unique genes)
Preparing variables for analysis...
    truncating Z-scores 3 points below zero or 6 standard deviations above the mean
    truncating covariate values more than 5 standard deviations from the mean
    total variables available for analysis: 124 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
    testing direction: one-sided, positive (sets), two-sided (covar)
    conditioning on internal variables:
        gene size, log(gene size)
        gene density, log(gene density)
        inverse mac, log(inverse mac)
    analysing individual variables

    analysing single-variable models (number of models: 124)
    writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level3.Test_top10.Top10pct.gsa.out

End time is 16:30:56, Thursday 03 Oct 2024 (elapsed: 00:00:03)
Reading enrichment results file into R.
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 1362 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
    --gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
    --set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f3011f0d8
    --out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level4.Test_top10.Top10pct

Start time is 16:30:59, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
    12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f3011f0d8... 
    1361 gene-set definitions read from file
    found 1361 gene sets containing genes defined in genotype data (containing a total of 10590 unique genes)
Preparing variables for analysis...
    truncating Z-scores 3 points below zero or 6 standard deviations above the mean
    truncating covariate values more than 5 standard deviations from the mean
    total variables available for analysis: 1361 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
    testing direction: one-sided, positive (sets), two-sided (covar)
    conditioning on internal variables:
        gene size, log(gene size)
        gene density, log(gene density)
        inverse mac, log(inverse mac)
    analysing individual variables

    analysing single-variable models (number of models: 1361)
    writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level4.Test_top10.Top10pct.gsa.out
    writing gene information to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level4.Test_top10.Top10pct.gsa.genes.out
    writing gene analysis results per significant result (after multiple testing correction, at alpha = 0.05) to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level4.Test_top10.Top10pct.gsa.sets.genes.out

End time is 16:31:02, Thursday 03 Oct 2024 (elapsed: 00:00:03)
Reading enrichment results file into R.
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 1865 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
    --gene-results /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw
    --set-annot /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f69db0d8f
    --out /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level5.Test_top10.Top10pct

Start time is 16:31:04, Thursday 03 Oct 2024

Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.genes.raw... 
    12190 genes read from file
Loading gene-set annotation...
Reading file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f69db0d8f... 
    1864 gene-set definitions read from file
    found 1864 gene sets containing genes defined in genotype data (containing a total of 10612 unique genes)
Preparing variables for analysis...
    truncating Z-scores 3 points below zero or 6 standard deviations above the mean
    truncating covariate values more than 5 standard deviations from the mean
    total variables available for analysis: 1864 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
    testing direction: one-sided, positive (sets), two-sided (covar)
    conditioning on internal variables:
        gene size, log(gene size)
        gene density, log(gene density)
        inverse mac, log(inverse mac)
    analysing individual variables

    analysing single-variable models (number of models: 1864)
    writing results to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level5.Test_top10.Top10pct.gsa.out
    writing gene information to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level5.Test_top10.Top10pct.gsa.genes.out
    writing gene analysis results per significant result (after multiple testing correction, at alpha = 0.05) to file /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level5.Test_top10.Top10pct.gsa.sets.genes.out

End time is 16:31:08, Thursday 03 Oct 2024 (elapsed: 00:00:04)
Reading enrichment results file into R.
Saving results ==> /var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/Test/MAGMA_celltyping.Test.rds
Warning message:
In file(file, "rt") :
  cannot open file '/var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T/RtmpWR0d37/MAGMA_Files/prospective_memory.ukb.tsv.35UP.10DOWN/prospective_memory.ukb.tsv.35UP.10DOWN.level2.Test_linear.Linear.gsa.out': No such file or directory

Expected behavior

Returns enrichment results.

Session info


R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.0

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] MAGMA.Celltyping_2.0.13

loaded via a namespace (and not attached):
  [1] splines_4.4.1                 later_1.3.2                   BiocIO_1.14.0                
  [4] bitops_1.0-8                  ggplotify_0.1.2               filelock_1.0.3               
  [7] tibble_3.2.1                  R.oo_1.26.0                   rex_1.2.1                    
 [10] XML_3.99-0.17                 lifecycle_1.0.4               rstatix_0.7.2                
 [13] rprojroot_2.0.4               lattice_0.22-6                MASS_7.3-61                  
 [16] crosstalk_1.2.1               backports_1.5.0               magrittr_2.0.3               
 [19] sass_0.4.9                    limma_3.60.5                  plotly_4.10.4                
 [22] rmarkdown_2.28                jquerylib_0.1.4               remotes_2.5.0.9000           
 [25] dlstats_0.1.7                 yaml_2.3.10                   httpuv_1.6.15                
 [28] sessioninfo_1.2.2             pkgbuild_1.4.4                HGNChelper_0.8.14            
 [31] RColorBrewer_1.1-3            DBI_1.2.3                     minqa_1.2.8                  
 [34] abind_1.4-8                   pkgload_1.4.0                 zlibbioc_1.50.0              
 [37] rvcheck_0.2.1                 GenomicRanges_1.56.1          purrr_1.0.2                  
 [40] R.utils_2.12.3                BiocGenerics_0.50.0           RCurl_1.98-1.16              
 [43] yulab.utils_0.1.7             VariantAnnotation_1.50.0      rappdirs_0.3.3               
 [46] rworkflows_1.0.3              GenomeInfoDbData_1.2.12       IRanges_2.38.1               
 [49] S4Vectors_0.42.1              tidytree_0.4.6                testthat_3.2.1.1             
 [52] codetools_0.2-20              DelayedArray_0.30.1           DT_0.33                      
 [55] tidyselect_1.2.1              aplot_0.2.3                   UCSC.utils_1.0.0             
 [58] farver_2.1.2                  lme4_1.1-35.5                 matrixStats_1.4.1            
 [61] stats4_4.4.1                  BiocFileCache_2.12.0          GenomicAlignments_1.40.0     
 [64] jsonlite_1.8.9                ellipsis_0.3.2                Formula_1.2-5                
 [67] tools_4.4.1                   treeio_1.28.0                 Rcpp_1.0.13                  
 [70] glue_1.8.0                    SparseArray_1.4.8             here_1.0.1                   
 [73] xfun_0.47                     usethis_3.0.0                 MatrixGenerics_1.16.0        
 [76] GenomeInfoDb_1.40.1           RNOmni_1.0.1.2                dplyr_1.1.4                  
 [79] withr_3.0.1                   BiocManager_1.30.25           fastmap_1.2.0                
 [82] boot_1.3-31                   fansi_1.0.6                   digest_0.6.37                
 [85] R6_2.5.1                      mime_0.12                     gridGraphics_0.5-1           
 [88] colorspace_2.1-1              RSQLite_2.3.7                 R.methodsS3_1.8.2            
 [91] utf8_1.2.4                    tidyr_1.3.1                   generics_0.1.3               
 [94] renv_1.0.9                    data.table_1.16.0             rtracklayer_1.64.0           
 [97] httr_1.4.7                    htmlwidgets_1.6.4             S4Arrays_1.4.1               
[100] pkgconfig_2.0.3               gtable_0.3.5                  blob_1.2.4                   
[103] covr_3.6.4                    SingleCellExperiment_1.26.0   XVector_0.44.0               
[106] brio_1.1.5                    htmltools_0.5.8.1             carData_3.0-5                
[109] profvis_0.4.0                 scales_1.3.0                  Biobase_2.64.0               
[112] png_0.1-8                     ggfun_0.1.6                   ggdendro_0.2.0               
[115] knitr_1.48                    rstudioapi_0.16.0             reshape2_1.4.4               
[118] rjson_0.2.23                  badger_0.2.4                  nlme_3.1-166                 
[121] curl_5.2.3                    nloptr_2.1.1                  cachem_1.1.0                 
[124] stringr_1.5.1                 BiocVersion_3.19.1            miniUI_0.1.1.1               
[127] parallel_4.4.1                AnnotationDbi_1.66.0          desc_1.4.3                   
[130] restfulr_0.0.15               pillar_1.9.0                  grid_4.4.1                   
[133] vctrs_0.6.5                   urlchecker_1.0.1              promises_1.3.0               
[136] ggpubr_0.6.0                  car_3.1-3                     dbplyr_2.5.0                 
[139] xtable_1.8-4                  evaluate_1.0.0                orthogene_1.10.0             
[142] GenomicFeatures_1.56.0        cli_3.6.3                     compiler_4.4.1               
[145] Rsamtools_2.20.0              rlang_1.1.4                   crayon_1.5.3                 
[148] grr_0.9.5                     ggsignif_0.6.4                gprofiler2_0.2.3             
[151] EWCE_1.12.0                   plyr_1.8.9                    fs_1.6.4                     
[154] stringi_1.8.4                 viridisLite_0.4.2             ewceData_1.12.0              
[157] BiocParallel_1.38.0           assertthat_0.2.1              babelgene_22.9               
[160] munsell_0.5.1                 Biostrings_2.72.1             lazyeval_0.2.2               
[163] gh_1.4.1                      devtools_2.4.5                homologene_1.4.68.19.3.27    
[166] Matrix_1.7-0                  ExperimentHub_2.12.0          MungeSumstats_1.13.4         
[169] BSgenome_1.72.0               patchwork_1.3.0               bit64_4.5.2                  
[172] ggplot2_3.5.1                 KEGGREST_1.44.1               statmod_1.5.0                
[175] shiny_1.9.1                   SummarizedExperiment_1.34.0   interactiveDisplayBase_1.42.0
[178] AnnotationHub_3.12.0          googleAuthR_2.0.2             gargle_1.5.2                 
[181] broom_1.0.7                   memoise_2.0.1                 bslib_0.8.0                  
[184] ggtree_3.12.0                 bit_4.5.0                     splitstackshape_1.4.8        
[187] ape_5.8   
bschilder commented 1 week ago

Originally reported by @Al-Murphy. Potentially related to:

Just as an update, I also tried the different versions of the human cell landscape CTD using github tags 'v0.1.10' and 'v0.0.1' but this didn't help either!

bschilder commented 1 week ago

One thing I'm noticing is that the error only occurs with specific combinations of CTD level and test type.

Specifically, CTD level 2 with the linear tests is the only one that's failing.

bschilder commented 1 week ago

We can see the celltype names aren't duplicated in the original CTD:

colnames(HCL$level_2$specificity_quantiles)[duplicated(colnames(HCL$level_2$specificity_quantiles))]
> character(0)

This remains true even after restandardising the CTD:

HCL2=EWCE::standardise_ctd(HCL, force_standardise = T)
colnames(HCL2$level_2$specificity_quantiles)[duplicated(colnames(HCL2$level_2$specificity_quantiles))]
> character(0)

So something is happening further downstream of this step.

bschilder commented 1 week ago

Ok, I think i pinpointed the reason.

At level 2 the CTD contains the cell types "Fetal_Neuron" and "Fetal_neuron". I think this is simply an inconsistency with how the original HCL authors annotated their cell types (I've noticed this a lot in that dataset). You can see this by reading in the gene covariate file referenced in the error message.

gcf <- data.table::fread("/var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f6515b4df")
cols <- grep("fetal_neuron",names(gcf), ignore.case = TRUE, value = TRUE)
cols 
>  "Fetal_neuron" "Fetal_Neuron"
gcf[,cols,with=FALSE]
       Fetal_neuron Fetal_Neuron
              <int>        <int>
    1:           19           13
    2:           19           26
    3:            6           11
    4:            0            0
    5:            4            0
   ---                          
17956:            0            0
17957:           27            0
17958:            0            0
17959:            0            0
17960:           35            9

R doesn't recognize these as duplicates, but internally MAGMA must be ignoring case so it does recognize them as duplicates and thus throws the error. Specifically at this step: https://github.com/neurogenomics/MAGMA_Celltyping/blob/0941d8c2a3b652112f21083e474fe2d56e4f9021/R/calculate_celltype_associations.r#L114

bschilder commented 1 week ago

I could add a step to drop dup columns when ignoring case, but the real solution is to regenerate the CTD after correcting the cell type annotations, because this will alter the expression and specificity scores.

bschilder commented 1 week ago

I've made some updates in MAGMA.Celltyping 2.0.14 (now pushed to GH), so that it automatically drops duplicate celltypes, but gives users more informative messages about why they're being dropped and which ones. It also recommends to them to reprocess the CTD accordingly.