Closed arita-clavy closed 2 years ago
Hi,
This issue is due to fact that the files in this dataset are deposited in a way that is similar to RNA-Seq data. For ATAC-Seq, Scarf expects a 'peaks.tsv.gz' rather than 'features.tsv.gz'. We will fix this issue in the data repository.
For now, the quick solution for you is to do the following:
# Please note that the separator is a space and not a tab
reader = scarf.CrDirReader('cusanovich_81K_mouse_atacseq', mtx_separator=' ')
# This is how we can rename the assay
reader.rename_assays({'RNA': 'ATAC'})
# The data in this file is prenormalized and is present as float values, hence adjust dtype
writer = scarf.CrToZarr(reader, 'cusanovich_81K.zarr', chunk_size=(1000, 10000), dtype=float)
writer.dump() # This will take upto 5 mins
# Load the data as a datastore object
ds = scarf.DataStore('cusanovich_81K.zarr', nthreads=4)
ds
# Turn off the default TF-IDF normalization because the data is already normalized (you will need to do this every time the data is loaded)
ds.ATAC.normMethod = scarf.assay.norm_dummy
# Thereafter, you can run a basic pipeline with default parameters like this:
ds.mark_prevalent_peaks()
ds.make_graph(feat_key="prevalent_peaks")
ds.run_umap()
ds.run_leiden_clustering()
ds.plot_layout(layout_key="ATAC_UMAP", color_by="ATAC_leiden_cluster")
Please let me know if this helps
Closing this for now. Please feel free to reopen this.
Hi staffs of scarf, I downloaded Cusanovich's 81K sc-ATAC-seq dataset using your fetch_dataset(), but when I was importing the dataset, this happened:
somehow it was auto-recognized as a RNA-seq assay, since file_type is deprecated I cannot either change the assay type or go on my downstream analysis. Any possible solutions?