hubmapconsortium / py-hubmap-dbgap

Python package that creates a dbGaP submission from HuBMAP datasets
1 stars 1 forks source link

Map `Slide-seq` to `library_strategy` #8

Closed icaoberg closed 1 year ago

icaoberg commented 1 year ago

@pdblood please let me know what library_strategy to map Slide-seq

image

to avoid the error

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[5], line 1
----> 1 data = hubmapdbgap.create.submission(hubmap_ids, dbgap_study_id=dbgap_study_id, token=token, prepend_sample_id=False )

File ~/.local/lib/python3.9/site-packages/hubmap_dbgap-1.0-py3.9.egg/hubmapdbgap/create.py:180, in submission(hubmap_ids, dbgap_study_id, token, prepend_sample_id)
    170 # library_strategy
    171 library_strategy = {
    172     "ATACseq-bulk": "ATAC-seq",
    173     "WGS": "WGS",
   (...)
    178     "snRNAseq-10xGenomics-v3": "RNA-Seq",
    179 }
--> 180 library_strategy = library_strategy[metadata["data_types"][0]]
    182 analyte_class = {"RNA": "TRANSCRIPTOMIC", "DNA": "GENOMIC"}
    183 library_source = analyte_class[
    184     metadata["ingest_metadata"]["metadata"]["analyte_class"]
    185 ]

KeyError: 'Slide-seq'
icaoberg commented 1 year ago

@pdblood already replied on Slack that it must RNA-Seq

icaoberg commented 1 year ago

@pdblood these are the dataset types associated with the UCSD datasets that need to be mapped (or confirmed)

{"['SNARE-ATACseq2']",
 "['SNARE-RNAseq2']",
 "['SNAREseq']",
 "['Slide-seq']",
 "['snRNAseq']",
 "['snRNAseq-10xGenomics-v3']"}

the current mapping is

    # library_strategy
    library_strategy = {
        "ATACseq-bulk": "ATAC-seq",
        "WGS": "WGS",
        "bulk-RNA": "RNA-Seq",
        "scRNAseq-10xGenomics-v3": "RNA-Seq",
        "snATACseq": "ATAC-seq",
        "Slide-seq": "RNA-Seq",
        "snRNAseq": "RNA-Seq",
        "snRNAseq-10xGenomics-v3": "RNA-Seq",
    }
pdblood commented 1 year ago

Everything listed under the current mapping is correct.

To handle SNAREseq and it's variants, do the following:

use the analyte_class along with the ‘SNAREseq’ data type to distinguish which library strategy it is. If it is SNAREseq and the analyte_class is DNA, then the library strategy is ATAC-Seq. If it is SNAREseq and the analyte_class is RNA, then the library strategy is RNA-Seq

pdblood commented 1 year ago

One more: if you hit the datatype scRNA-Seq-10x then the library_strategy is RNA-Seq