theislab / mubind

Learning motif contributions to cell transitions using sequence features and graphs.
https://mubind.readthedocs.io
MIT License
27 stars 0 forks source link

Data Availability | Tutorial #131

Closed shunyasanuma closed 1 month ago

shunyasanuma commented 2 months ago

Hello,

I would like to go through the tutorial, but the link (www.readthedocs.io/mubind) for single-cell genomics data in the data availability section of the paper does not seem to work.

Could you please look into this issue? Thank you.

ilibarra commented 2 months ago

@shunyasanuma Thank you for your comment! There is a redirection typo in the URL from the preprint. We'll update the URL in future versions.

The current documentation URL is highlighted in the repository README. I'll update this before in that entry, for visibility. https://mubind.readthedocs.io/en/latest/ https://github.com/theislab/mubind?tab=readme-ov-file#resources

A new version of the documentation should be up today. Let us know if there are other issues or comments as you inspect the tutorials, or other questions. Keeping this issue open a few days, and you can close if this info is ok.

Good luck!

shunyasanuma commented 2 months ago

Thank you for your reply. Could you clarify the location of the input datasets after the page updates?

shunyasanuma commented 2 months ago

I checked the updated datasets.py script and downloaded files from Dropbox. While I was able to open pancreas_multiome_2022_processed_rna_velocities_2024.h5ad, I could not open pancreas_multiome_2022_processed_atac.h5ad. The following error occurred:

import h5py

try:
    with h5py.File('pancreas_multiome_2022_processed_atac.h5ad', 'r') as f:
        print(list(f.keys()))
except Exception as e:
    print(f"Error: {e}")

Error: Unable to synchronously open file (file signature not found)

For pancreas_multiome_2022_processed_rna_velocities_2024.h5ad,

import h5py

try:
    with h5py.File('pancreas_multiome_2022_processed_rna_velocities_2024.h5ad', 'r') as f:
        print(list(f.keys()))
except Exception as e:
    print(f"Error: {e}")

['X', 'layers', 'obs', 'obsm', 'obsp', 'uns', 'var', 'varm', 'varp']

I am also having problems downloading the PWM file; I don’t even see the file/URL in dataset.py

pwms = mb.datasets.archetypes()
len(pwms)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[5], line 3
      1 # these are motifs used during training (pre-weights)
      2 # pwms = mb.datasets.cisbp_hs()
----> 3 pwms = mb.datasets.archetypes()
      4 # pwms = pwms[:20]
      5 len(pwms)

File ~/.local/lib/python3.12/site-packages/mubind/datasets/datasets.py:493, in archetypes(**kwargs)
    490 ppm_by_name = {}
    491 archetypes_dir = os.path.join(mb.bindome.constants.ANNOTATIONS_DIRECTORY, 'archetypes')
--> 493 anno = archetypes_anno(**kwargs)
    494 clu = archetypes_anno(**kwargs)
    496 # read PFM across meme files

File ~/.local/lib/python3.12/site-packages/mubind/datasets/datasets.py:481, in archetypes_anno(**kwargs)
    478 def archetypes_anno(**kwargs):
    479     # read reference clusters
    480     archetypes_dir = os.path.join(mb.bindome.constants.ANNOTATIONS_DIRECTORY, 'archetypes')
--> 481     anno = pd.read_excel(os.path.join(archetypes_dir, 'motif_annotations.xlsx'), sheet_name='Archetype clusters')
    482     return anno

File ~/.local/lib/python3.12/site-packages/pandas/io/excel/_base.py:504, in read_excel(io, sheet_name, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, date_format, thousands, decimal, comment, skipfooter, storage_options, dtype_backend, engine_kwargs)
    502 if not isinstance(io, ExcelFile):
    503     should_close = True
--> 504     io = ExcelFile(
    505         io,
    506         storage_options=storage_options,
    507         engine=engine,
    508         engine_kwargs=engine_kwargs,
    509     )
    510 elif engine and engine != io.engine:
    511     raise ValueError(
    512         "Engine should not be specified when passing "
    513         "an ExcelFile - ExcelFile already has the engine set"
    514     )

File ~/.local/lib/python3.12/site-packages/pandas/io/excel/_base.py:1563, in ExcelFile.__init__(self, path_or_buffer, engine, storage_options, engine_kwargs)
   1561     ext = "xls"
   1562 else:
-> 1563     ext = inspect_excel_format(
   1564         content_or_path=path_or_buffer, storage_options=storage_options
   1565     )
   1566     if ext is None:
   1567         raise ValueError(
   1568             "Excel file format cannot be determined, you must specify "
   1569             "an engine manually."
   1570         )

File ~/.local/lib/python3.12/site-packages/pandas/io/excel/_base.py:1419, in inspect_excel_format(content_or_path, storage_options)
   1416 if isinstance(content_or_path, bytes):
   1417     content_or_path = BytesIO(content_or_path)
-> 1419 with get_handle(
   1420     content_or_path, "rb", storage_options=storage_options, is_text=False
   1421 ) as handle:
   1422     stream = handle.handle
   1423     stream.seek(0)

File ~/.local/lib/python3.12/site-packages/pandas/io/common.py:872, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    863         handle = open(
    864             handle,
    865             ioargs.mode,
   (...)
    868             newline="",
    869         )
    870     else:
    871         # Binary mode
--> 872         handle = open(handle, ioargs.mode)
    873     handles.append(handle)
    875 # Convert BytesIO or file objects passed with an encoding

FileNotFoundError: [Errno 2] No such file or directory: 'annotations/archetypes/motif_annotations.xlsx'
ilibarra commented 2 months ago

@shunyasanuma thanks for your patience.

Regarding the dataset downloading point, we have updated URLs (dl=1) and retested those. We do not see the file signature error anymore. We included a test_dataset.test_download_and_load_dataset to confirm that downloading is working. Please try it out! image

If corrupted files are still in your filesystem, please delete them. In the worst case, you can directly download the file by opening the URL in your browser, and replacing it in the looked-up destination.

Regarding the archetypes data downloading, thanks for pointing this missing file! Now the downloading of those is included without dependencies to bindome. The updated function mb.datasets.archetypes can be tested with test_dataset.test_archetypes.

Updates are now https://github.com/theislab/mubind/pull/132.

Please let us know if you have additional dataset downloading issues. We'll keep the issue open for a couple of weeks. Good luck!

shunyasanuma commented 2 months ago

Thank you for your updates. I was able to load the RNA dataset (as you indicated with the screenshot), but I'm still having problems loading the ATAC data. Could you try loading pancreas_multiome_2022_processed_atac.h5ad in your environment? The same problem occurs if I download the file from Dropbox.

pancreas_multiome_2022_processed_rna_velocities_2024.h5ad

pancreas_multiome_2022_processed_rna_velocities_2024.h5ad
True data/scatac/pancreas_multiome/pancreas_multiome_2022_processed_rna_velocities_2024.h5ad

AnnData object with n_obs × n_vars = 16918 × 14663
    obs: 'n_counts', 'sample', 'n_genes', 'log_genes', 'mt_frac', 'rp_frac', 'ambi_frac', 'nCount_RNA', 'nFeature_RNA', 'nCount_ATAC', 'nFeature_ATAC', 'nucleosome_signal', 'nucleosome_percentile', 'TSS.enrichment', 'TSS.percentile', 'S_score', 'G2M_score', 'phase', 'proliferation', 'celltype', 'nCount_peaks', 'nFeature_peaks', 'sample_batch', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'batch', 'velocity_self_transition'
    var: 'modality', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'gene_count_corr', 'velocity_gamma', 'velocity_qreg_ratio', 'velocity_r2', 'velocity_genes'
    uns: 'celltype_colors', 'neighbors', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_pca', 'X_pca_wsnn', 'X_spca_wsnn', 'X_umap', 'X_umap_ATAC', 'X_umap_GEX', 'X_umap_wsnn', 'lsi_full', 'lsi_red', 'umap', 'umap_ATAC', 'umap_GEX', 'velocity_umap'
    layers: 'Ms', 'Mu', 'ambiguous', 'matrix', 'spliced', 'unspliced', 'variance_velocity', 'velocity'
    obsp: 'connectivities', 'distances'

pancreas_multiome_2022_processed_atac.h5ad

mb.datasets.pancreas_atac()

True data/scatac/pancreas_multiome/pancreas_multiome_2022_processed_atac.h5ad
reading ATAC

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[16], line 1
----> 1 mb.datasets.pancreas_atac()

File ~/.local/lib/python3.12/site-packages/mubind/datasets/datasets.py:605, in pancreas_atac(file_path)
    603 print(os.path.exists(file_path), file_path)
    604 print('reading ATAC')
--> 605 adata = read(file_path, backup_url=url, sparse=True, cache=True)
    606 print('opening ATAC successful')
    607 adata.var_names_make_unique()

File /usr/local/anaconda3-2020/lib/python3.12/site-packages/legacy_api_wrap/__init__.py:80, in legacy_api.<locals>.wrapper.<locals>.fn_compatible(*args_all, **kw)
     77 @wraps(fn)
     78 def fn_compatible(*args_all: P.args, **kw: P.kwargs) -> R:
     79     if len(args_all) <= n_positional:
---> 80         return fn(*args_all, **kw)
     82     args_pos: P.args
     83     args_pos, args_rest = args_all[:n_positional], args_all[n_positional:]

File /usr/local/anaconda3-2020/lib/python3.12/site-packages/scanpy/readwrite.py:129, in read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, cache_compression, **kwargs)
    127 filename = Path(filename)  # allow passing strings
    128 if is_valid_filename(filename):
--> 129     return _read(
    130         filename,
    131         backed=backed,
    132         sheet=sheet,
    133         ext=ext,
    134         delimiter=delimiter,
    135         first_column_names=first_column_names,
    136         backup_url=backup_url,
    137         cache=cache,
    138         cache_compression=cache_compression,
    139         **kwargs,
    140     )
    141 # generate filename and read to dict
    142 filekey = str(filename)

File /usr/local/anaconda3-2020/lib/python3.12/site-packages/scanpy/readwrite.py:764, in _read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, cache_compression, suppress_cache_warning, **kwargs)
    762 if ext in {"h5", "h5ad"}:
    763     if sheet is None:
--> 764         return read_h5ad(filename, backed=backed)
    765     else:
    766         logg.debug(f"reading sheet {sheet} from file {filename}")

File /usr/local/anaconda3-2020/lib/python3.12/site-packages/anndata/_io/h5ad.py:237, in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
    229         raise NotImplementedError(
    230             "Currently only `X` and `raw/X` can be read as sparse."
    231         )
    233 rdasp = partial(
    234     read_dense_as_sparse, sparse_format=as_sparse_fmt, axis_chunk=chunk_size
    235 )
--> 237 with h5py.File(filename, "r") as f:
    239     def callback(func, elem_name: str, elem, iospec):
    240         if iospec.encoding_type == "anndata" or elem_name.endswith("/"):

File ~/.local/lib/python3.12/site-packages/h5py/_hl/files.py:562, in File.__init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, alignment_threshold, alignment_interval, meta_block_size, **kwds)
    553     fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0,
    554                      locking, page_buf_size, min_meta_keep, min_raw_keep,
    555                      alignment_threshold=alignment_threshold,
    556                      alignment_interval=alignment_interval,
    557                      meta_block_size=meta_block_size,
    558                      **kwds)
    559     fcpl = make_fcpl(track_order=track_order, fs_strategy=fs_strategy,
    560                      fs_persist=fs_persist, fs_threshold=fs_threshold,
    561                      fs_page_size=fs_page_size)
--> 562     fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
    564 if isinstance(libver, tuple):
    565     self._libver = libver

File ~/.local/lib/python3.12/site-packages/h5py/_hl/files.py:235, in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    233     if swmr and swmr_support:
    234         flags |= h5f.ACC_SWMR_READ
--> 235     fid = h5f.open(name, flags, fapl=fapl)
    236 elif mode == 'r+':
    237     fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5f.pyx:102, in h5py.h5f.open()

OSError: Unable to synchronously open file (file signature not found)
ilibarra commented 2 months ago

@shunyasanuma Great to know it is now working well for RNA! Regarding ATAC, today I replaced the file by mistake. I just reverted this and one can use the same Dropbox URL (no code changes). image Could you please confirm this is also working with you? Let us know if further debugging is needed. Thanks!

shunyasanuma commented 2 months ago

Yes! I could load the ATAC data.

import anndata

# Load the H5AD file
adata = anndata.read_h5ad('pancreas_multiome_2022_processed_atac.h5ad')

# Print the structure
print(adata)

AnnData object with n_obs × n_vars = 16918 × 228259
    obs: 'n_counts', 'sample', 'n_genes', 'log_genes', 'mt_frac', 'rp_frac', 'ambi_frac', 'nCount_RNA', 'nFeature_RNA', 'nCount_ATAC', 'nFeature_ATAC', 'nucleosome_signal', 'nucleosome_percentile', 'TSS.enrichment', 'TSS.percentile', 'S_score', 'G2M_score', 'phase', 'proliferation', 'celltype', 'nCount_peaks', 'nFeature_peaks'
    var: 'modality'
    uns: 'celltype_colors', 'neighbors'
    obsm: 'X_pca', 'X_pca_wsnn', 'X_spca_wsnn', 'X_umap', 'X_umap_ATAC', 'X_umap_GEX', 'X_umap_wsnn', 'lsi_full', 'lsi_red', 'umap', 'umap_ATAC', 'umap_GEX'
    obsp: 'connectivities', 'connectivities_wnn', 'distances', 'distances_wnn'

FYI: For pwms, I downloaded the input files from Dropbox and manually defined archetypes_anno. It worked (Annotation Table & PWM Weights on Dropbox), but mb.datasets.archetypes() still does not.

---------------------------------------------------------------------------
SSLError                                  Traceback (most recent call last)
File /usr/local/anaconda3-2020/lib/python3.12/urllib/request.py:1344, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1343 try:
-> 1344     h.request(req.get_method(), req.selector, req.data, headers,
   1345               encode_chunked=req.has_header('Transfer-encoding'))
   1346 except OSError as err: # timeout error

File /usr/local/anaconda3-2020/lib/python3.12/http/client.py:1336, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
   1335 """Send a complete request to the server."""
-> 1336 self._send_request(method, url, body, headers, encode_chunked)

File /usr/local/anaconda3-2020/lib/python3.12/http/client.py:1382, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
   1381     body = _encode(body, 'body')
-> 1382 self.endheaders(body, encode_chunked=encode_chunked)

File /usr/local/anaconda3-2020/lib/python3.12/http/client.py:1331, in HTTPConnection.endheaders(self, message_body, encode_chunked)
   1330     raise CannotSendHeader()
-> 1331 self._send_output(message_body, encode_chunked=encode_chunked)

File /usr/local/anaconda3-2020/lib/python3.12/http/client.py:1091, in HTTPConnection._send_output(self, message_body, encode_chunked)
   1090 del self._buffer[:]
-> 1091 self.send(msg)
   1093 if message_body is not None:
   1094 
   1095     # create a consistent interface to message_body

File /usr/local/anaconda3-2020/lib/python3.12/http/client.py:1035, in HTTPConnection.send(self, data)
   1034 if self.auto_open:
-> 1035     self.connect()
   1036 else:

File /usr/local/anaconda3-2020/lib/python3.12/http/client.py:1477, in HTTPSConnection.connect(self)
   1475     server_hostname = self.host
-> 1477 self.sock = self._context.wrap_socket(self.sock,
   1478                                       server_hostname=server_hostname)

File /usr/local/anaconda3-2020/lib/python3.12/ssl.py:455, in SSLContext.wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    449 def wrap_socket(self, sock, server_side=False,
    450                 do_handshake_on_connect=True,
    451                 suppress_ragged_eofs=True,
    452                 server_hostname=None, session=None):
    453     # SSLSocket class handles server_hostname encoding before it calls
    454     # ctx._wrap_socket()
--> 455     return self.sslsocket_class._create(
    456         sock=sock,
    457         server_side=server_side,
    458         do_handshake_on_connect=do_handshake_on_connect,
    459         suppress_ragged_eofs=suppress_ragged_eofs,
    460         server_hostname=server_hostname,
    461         context=self,
    462         session=session
    463     )

File /usr/local/anaconda3-2020/lib/python3.12/ssl.py:1042, in SSLSocket._create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
   1041                 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
-> 1042             self.do_handshake()
   1043 except:

File /usr/local/anaconda3-2020/lib/python3.12/ssl.py:1320, in SSLSocket.do_handshake(self, block)
   1319         self.settimeout(None)
-> 1320     self._sslobj.do_handshake()
   1321 finally:

SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] ssl/tls alert handshake failure (_ssl.c:1000)

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
Cell In[3], line 1
----> 1 pwms = mb.datasets.archetypes()
      2 len(pwms)

File ~/.local/lib/python3.12/site-packages/mubind/datasets/datasets.py:547, in archetypes(**kwargs)
    545 kwargs['url'] = url
    546 archetypes_dir = 'data/archetypes'
--> 547 anno = archetypes_anno(**kwargs)
    548 clu = archetypes_anno(**kwargs)
    550 # PWM weights

File ~/.local/lib/python3.12/site-packages/mubind/datasets/datasets.py:489, in archetypes_anno(**kwargs)
    487    if not os.path.exists(archetypes_dir):
    488         os.makedirs(archetypes_dir)
--> 489    urllib.request.urlretrieve(kwargs['url'], archetypes_path)
    491 anno = pd.read_excel(archetypes_path, sheet_name='Archetype clusters')
    492 return anno

File /usr/local/anaconda3-2020/lib/python3.12/urllib/request.py:240, in urlretrieve(url, filename, reporthook, data)
    223 """
    224 Retrieve a URL into a temporary location on disk.
    225 
   (...)
    236 data file as well as the resulting HTTPMessage object.
    237 """
    238 url_type, path = _splittype(url)
--> 240 with contextlib.closing(urlopen(url, data)) as fp:
    241     headers = fp.info()
    243     # Just return the local path and the "headers" for file://
    244     # URLs. No sense in performing a copy unless requested.

File /usr/local/anaconda3-2020/lib/python3.12/urllib/request.py:215, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    213 else:
    214     opener = _opener
--> 215 return opener.open(url, data, timeout)

File /usr/local/anaconda3-2020/lib/python3.12/urllib/request.py:515, in OpenerDirector.open(self, fullurl, data, timeout)
    512     req = meth(req)
    514 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 515 response = self._open(req, data)
    517 # post-process response
    518 meth_name = protocol+"_response"

File /usr/local/anaconda3-2020/lib/python3.12/urllib/request.py:532, in OpenerDirector._open(self, req, data)
    529     return result
    531 protocol = req.type
--> 532 result = self._call_chain(self.handle_open, protocol, protocol +
    533                           '_open', req)
    534 if result:
    535     return result

File /usr/local/anaconda3-2020/lib/python3.12/urllib/request.py:492, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    490 for handler in handlers:
    491     func = getattr(handler, meth_name)
--> 492     result = func(*args)
    493     if result is not None:
    494         return result

File /usr/local/anaconda3-2020/lib/python3.12/urllib/request.py:1392, in HTTPSHandler.https_open(self, req)
   1391 def https_open(self, req):
-> 1392     return self.do_open(http.client.HTTPSConnection, req,
   1393                         context=self._context)

File /usr/local/anaconda3-2020/lib/python3.12/urllib/request.py:1347, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
   1344         h.request(req.get_method(), req.selector, req.data, headers,
   1345                   encode_chunked=req.has_header('Transfer-encoding'))
   1346     except OSError as err: # timeout error
-> 1347         raise URLError(err)
   1348     r = h.getresponse()
   1349 except:

URLError: <urlopen error [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] ssl/tls alert handshake failure (_ssl.c:1000)>
ilibarra commented 2 months ago

Thanks for your update. ATAC OK then.

About PWM weights and archetype annotations, there's a test for this function in the repo, and GitHub actions are passing it with the standard installation. https://github.com/theislab/mubind/actions/runs/10538074958/job/29200180490#step:5:18 https://github.com/theislab/mubind/blob/main/tests/test_datasets.py#L59 image

Please indicate if you can pass python -m pytest tests/test_datasets.py -k test_archetypes. If not, from what I interpret in your log, Python 3.12 might require specific SSL certifications to run this loading function. Please confirm OS/python/urllib/openssl versions for further inspection, and we'll include it in GH actions. Thank you! https://github.com/theislab/mubind/blob/main/pyproject.toml#L24

shunyasanuma commented 2 months ago

The output confirms that the test related to test_archetypes in the test_datasets.pyfile passed successfully, but the same error (URLError) occurs withmb.datasets.archetypes()`

python -m test_datasets.py -k test_archetypes
=============================================================== test session starts ===============================================================
platform linux -- Python 3.12.3, pytest-8.3.2, pluggy-1.5.0
rootdir: mubind
plugins: cov-5.0.0, anyio-4.4.0
collected 5 items / 4 deselected / 1 selected                                                                                                     

../../../../mubind/test_datasets.py .                                             [100%]

======================================================== 1 passed, 4 deselected in 18.91s =========================================================
ilibarra commented 2 months ago

@shunyasanuma regarding your last message, please expand your example in case of further errors. We have tested 3.9-3.12 and test_datasets.py seems to work well. Your previous log also seems ok now. Thank you. https://github.com/theislab/mubind/actions/runs/10569448623/job/29282290482

shunyasanuma commented 2 months ago

I went through the tutorial, "Mouse Pancreatic Endocrinogenesis (scATAC-seq) | Training with an RNA-dynamics kNN-graph."

While the shape of ad was:

ad.shape
(96, 50)

it changed to: (8161, 50000)

save_output = True

if save_output:
    for use_logdynamic in [False, True]:
        p = 'pancreas_multiome_use_logdynamic_%i_obs%i_var%i.pth' % (use_logdynamic, ad.shape[0], ad.shape[1])
        print(p)
        torch.save(model_by_logdynamic[use_logdynamic], p)

    ad.write('atac_train.h5ad')
    rna_sample.write('rna_sample_train.h5ad')

    import pickle
    pickle.dump(train, open('train_dataloader.pkl', 'wb'))

pancreas_multiome_use_logdynamic_0_obs8161_var50000.pth
pancreas_multiome_use_logdynamic_1_obs8161_var50000.pth

For the Mouse Pancreatic Endocrinogenesis (scATAC-seq) | Model Evaluation tutorial, you use *obs8161_var50000.pth instead of *obs96_var50.pth. The *obs96_var50.pth files are smaller.

I’m not in a rush, but it would be great if I could go through the evaluation tutorial with the *obs8161_var50000.pth files before using my data. This way, I can use the tutorial inputs for reference.

Could you please upload the *obs8161_var50000.pth files?

ilibarra commented 1 month ago

@shunyasanuma Thank you for the feedback!

We included a Dropbox URL in the pancreas endocrinogenesis evaluation notebook, to download model training files, as indicated. These files can be loaded in the evaluation snippets. We will also include similar links for the other two datasets in the next few days. https://github.com/theislab/mubind/blob/stable/docs/notebooks/single_cell/02_2_2_scatac_multiome_pancreas_priors_evaluate.ipynb https://www.dropbox.com/scl/fo/7h98je53on9vl1u8o0gqm/APUZFwi1dWZM2ELoZqS3XbM?rlkey=cty8wpyyx8zjmh4v5yj3tshet&e=1

Let us know if this works well. Keeping this issue open some days for comments or wrap-up. Thank you!

shunyasanuma commented 1 month ago

Thank you for uploading the files. Currently, the GPU on the HPC is temporarily unavailable. I’ll run the script with the files and update you on whether it worked or not once the GPU is available.

shunyasanuma commented 1 month ago

I was able to run the tutorial without errors and obtained results similar to those in your tutorial.

ilibarra commented 1 month ago

Glad to know the tutorial executed without errors. More comments can reopen the issue, or be assigned in a new one if topic n/a. Thank you!