single-cell-data / TileDB-SOMA

Python and R SOMA APIs using TileDB’s cloud-native format. Ideal for single-cell data at any scale.
https://tiledbsoma.readthedocs.io
MIT License
90 stars 25 forks source link

[c++] Revert #2895 #2921

Closed ryan-williams closed 1 month ago

ryan-williams commented 1 month ago

Issue and/or context: #2920

cellxgene_census_builder//test_builder.py started failing at #2895:

Originally reported by @ivirshup.

Changes:

Revert #2895 (as well as some bits of #2897 and #2909 that were merged since). test_builder.py passes again with this change.

Notes for Reviewer:

I'm not clear on the significance of including (and now removing) the TimestampRange in soma_array.cc; hoping someone can confirm they understand why it was causing an issue, and that it's fine to (temporarily?) revert.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 90.03%. Comparing base (6f88d57) to head (4c4a6c3). Report is 2 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2921 +/- ## ========================================== + Coverage 89.88% 90.03% +0.15% ========================================== Files 37 37 Lines 3925 3925 ========================================== + Hits 3528 3534 +6 + Misses 397 391 -6 ``` | [Flag](https://app.codecov.io/gh/single-cell-data/TileDB-SOMA/pull/2921/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=single-cell-data) | Coverage Δ | | |---|---|---| | [python](https://app.codecov.io/gh/single-cell-data/TileDB-SOMA/pull/2921/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=single-cell-data) | `90.03% <ø> (+0.15%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=single-cell-data#carryforward-flags-in-the-pull-request-comment) to find out more. | [Components](https://app.codecov.io/gh/single-cell-data/TileDB-SOMA/pull/2921/components?src=pr&el=components&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=single-cell-data) | Coverage Δ | | |---|---|---| | [python_api](https://app.codecov.io/gh/single-cell-data/TileDB-SOMA/pull/2921/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=single-cell-data) | `90.03% <ø> (+0.15%)` | :arrow_up: | | [libtiledbsoma](https://app.codecov.io/gh/single-cell-data/TileDB-SOMA/pull/2921/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=single-cell-data) | `∅ <ø> (∅)` | |
nguyenv commented 1 month ago

I'm not clear on the significance of including (and now removing) the TimestampRange in soma_array.cc; hoping someone can confirm they understand why it was causing an issue, and that it's fine to (temporarily?) revert.

For arrays with enumeration columns, when we first create the array, we set the schema so that the values are empty. On the first write, we read the dictionary values from the arrow array and then call extend enumeration to write the values. We were encountering an issue where when writing an enumeration to a specified timestamp, the readback values were not maching. The solution is to perform the array schema evolution for extending the enumation at that specified timestamp, rather than at the default current timestamp. However, this seems to have created unintentional buggy side-effects. I'll put this on my list to look at today.