Closed srdsam closed 5 months ago
@srdsam thanks for the report!
Indeed, if you can please share input-data files, that would be helpful. I tried your script above and was unable to reproduce the problem.
If you can share full data, great -- or, it would suffice at first to share the "obs_id"
and "var_id"
columns for adata.obs
and adata.var
for each of your input-data files. (My guess is that the obs_id
fields are repeated across your multiple .h5ad
files, but I'd love to validate this guess.)
Thanks for the prompt reply @johnkerl ! Sorry been a bit over-subscribed this week but I'll regenerate and try to share them this weekend. I'll also check uniqueness myself.
Also as a quick general question, is the best place to reach out for questions around TileDB-SOMA here or via Slack (CZI Science/TileDB's one)?
For context:
Working on leveraging TileDB-SOMA in a production environment to develop a data explorer for the census among other datasets. Been taking notes from CZI's projects (single-cell-data-explorer, cellxgene etc.), but still have some questions (likely run into a few edge cases) and am hesitant to over-clutter the issue pages here with non-issues.
P.S. I've also been able to figure out a lot from your issues/PRs here so the discussions here have been super helpful :)
Thanks for the prompt reply @johnkerl ! Sorry been a bit over-subscribed this week but I'll regenerate and try to share them this weekend. I'll also check uniqueness myself.
All good! :)
Also as a quick general question, is the best place to reach out for questions around TileDB-SOMA here or via Slack (CZI Science/TileDB's one)?
Either is fine! :) I guess I'd lean slightly toward here as it's public by default
For context: ...
That is helpful, thanks!
P.S. I've also been able to figure out a lot from your issues/PRs here so the discussions here have been super helpful :)
Thanks for the positive feedback -- the goal indeed is to build up in realtime a transparent reference corpus 🙏
Hi @srdsam -- should we close this?
Yep! Closed :)
Describe the bug
When I ingest multiple
h5ad
files withtiledbsoma.io.from_h5ad
using aregistration_mapping
(generated withtiledbsoma.io.register_h5ads
), thefrom_h5ad
function didn't throw any errors. However only the data from the firsth5ad
seemed to populate the SOMA object. Downstream objects populateobs
andvar
correctly, but notX
(is just filled with 0s).I ended up circumventing this error by using the
append
functions (e.g.append_obs
,append_var
, andappend_X
). Unsure if I was misusing thetiledbsoma.io.from_h5ad
function or registration mapping? It data ingestion to work file with fewer than 10 h5ads (didn't profile exactly).To Reproduce
The ingest function took in a list of h5ads converted from GTEx's bulk tissue data. Happy to provide full code or list of h5ads if that helps. Just want to sanity check my use of TileDB-SOMA first.
Versions (please complete the following information):
Additional context
Also noticed that a similar issue when ingesting multiple measurements into the same SOMA object,
obs
is not written correctly. The 'single-cell' data inobs
always ends up overwriting thebulk
. Solved this by just creating seperate SOMAs.For reference this is the code that works fine:
Thanks in advance!