Closed matejasoretic closed 5 months ago
Try changing your centroids_init_path
to a different file. It might be trying to use the one that was generated for a smaller number of genes (thus causing this error).
Yes, that was the issue, thank you! Closing the issue.
I have encountered the following issue: When I try running SATURN with 8000HVGs and 2000 macro genes on my multi-species dataset, it will work. However, when I tried increasing these values to 12000 HVGs and 3000 macro genes, I encountered the following error:
Traceback (most recent call last): File "/path/to/train-saturn.py", line 1064, in <module> trainer(args) File "/path/to/train-saturn.py", line 575, in trainer centroid_weights.append(torch.tensor(species_genes_scores[sgn])) KeyError: 'axolotl_AMEX60DD000047-TMEM132B'
Earlier in the output I could see:After loading the anndata axolotl View of AnnData object with n_obs × n_vars = 4198 × 42647
So there were more than 12000 genes shared between the axolotl object and its corresponding .pt file. Each species in my dataset had more than 12000 genes, the minimum being 15577.The gene in question had one peptide in the peptide .fa file I used
I checked, and this gene is expressed in some cells in the object, its expression is not 0.
My command was python3 /path/to/train-saturn.py --in_data /path/to/all_species_run.csv \ --in_label_col=cell_type --ref_label_col=cell_type \ --num_macrogenes=2000 --hv_genes=8000 \ # or --num_macrogenes=3000 --hv_genes=12000 \ --centroids_init_path=/path/to/saturn_results//all_species_centroids.pkl \ --score_adata --ct_map_path=/path/to/cell_type_map.csv \ --work_dir=/path/to/work_dir/
What could be causing the SATURN to fail when the number of genes is increased?