So I'm currently working on a multi-sample pipeline and I've followed your tutorial. I currently have to rename barcodes such that they're the same across snATAC-seq and snRNA-seq data. I'm appending the sample to the beginning of the barcode as such:
E14_data_directory = f"{data_directory}/snATAC-seq/E14"
E18_data_directory = f"{data_directory}/snATAC-seq/E18"
def generate_file_paths(file_names, dir):
input_files = [f"{dir}/fragments/{file_name}.tsv.gz" for file_name in file_names]
adata_samples = [f"{dir}/anndata/{file_name}.h5ad" for file_name in file_names]
return input_files, adata_samples
E18_file_names = ["E18_Hom_1", "E18_Hom_2", "E18_Con_1", "E18_Con_2"]
E18_input_files, E18_adata_samples = generate_file_paths(
E18_file_names, E18_data_directory
)
E14_file_names = ["E14_Hom_1", "E14_Hom_2", "E14_Con_1", "E14_Con_2"]
E14_input_files, E14_adata_samples = generate_file_paths(
E14_file_names, E14_data_directory
)
combined_file_names = [*E14_file_names, *E18_file_names]
combined_input_files = [*E14_input_files, *E18_input_files]
combined_adata_samples = [*E14_adata_samples, *E18_adata_samples]
# NOTE: ONLY set this to true if you have RAM mounted as a file system (like /dev/shm)
use_ram = False
# Genome to use
chrom_sizes = snap.genome.mm39.chrom_sizes
if use_ram:
os.mkdir("/dev/shm/snap_tmp_dir")
adatas = snap.pp.import_data(
combined_input_files,
sorted_by_barcode=False,
file=combined_adata_samples,
tempdir="/dev/shm/snap_tmp_dir",
chrom_sizes=snap.genome.mm10,
min_num_fragments=1000,
)
else:
adatas = snap.pp.import_data(
combined_input_files,
sorted_by_barcode=False,
file=combined_adata_samples,
tempdir=f"{data_directory}/tmp",
chrom_sizes=snap.genome.mm10,
min_num_fragments=1000,
)
for adata, sample_name in zip(adatas, combined_file_names):
adata.obs_names = [f"{sample_name}:{barcode}" for barcode in adata.obs_names]
atac_data_adataset = snap.AnnDataSet(
adatas=[(name, adata) for name, adata in zip(combined_file_names, adatas)],
filename=f"{data_directory}/snATAC-seq/atac_brain_data.h5ads"
)
This code successfully renames each barcode within each sample and the renamed barcodes persist when I load it into an AnnDataSet object. When I try to do that same to an AnnDataSet object, the indexes reset when I reload it. I think there must be no synchronisation happening between obs_names and obs.index in your pyanndata. Perhaps it's a good idea to be able to rename the obs_names from an AnnDataSet object instead of having to load each AnnData and assign it there.
Hi,
So I'm currently working on a multi-sample pipeline and I've followed your tutorial. I currently have to rename barcodes such that they're the same across snATAC-seq and snRNA-seq data. I'm appending the sample to the beginning of the barcode as such:
This code successfully renames each barcode within each sample and the renamed barcodes persist when I load it into an AnnDataSet object. When I try to do that same to an AnnDataSet object, the indexes reset when I reload it. I think there must be no synchronisation happening between
obs_names
andobs.index
in your pyanndata. Perhaps it's a good idea to be able to rename the obs_names from an AnnDataSet object instead of having to load each AnnData and assign it there.