openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
308 stars 77 forks source link

Pancreas counts are not integers #680

Closed rcannood closed 1 year ago

rcannood commented 1 year ago
import openproblems
pan = openproblems.data.pancreas.load_pancreas()
print(pan.layers['counts'])
(0, 16)       1.0019583
(0, 20)       1.0019583
(0, 25)       1.0019583
(0, 30)       2.0078535
(0, 40)       1.0019583
(0, 66)       4.0315795
:     :
(16381, 18633)        93.4915
(16381, 18641)        5.5119
(16381, 18660)        1.2677
(16381, 18661)        4.6227
(16381, 18665)        0.1375
(16381, 18675)        7.6811
(16381, 18681)        32.8928
(16381, 18684)        0.6472
(16381, 18687)        113.5626
(16381, 18698)        1.236
(16381, 18705)        0.1846
(16381, 18711)        3.3464
(16381, 18728)        0.2025

The .layers['counts'] matrix should be whole numbers instead of real numbers.

If there were scripts available for these datasets (related to #679) I'd be able to see where things went wrong.

LuckyMD commented 1 year ago

Look at the batch assignment of the non-integeger values. They may be from non-UMI technologies.

scottgigante-immunai commented 1 year ago

These are indeed the non-UMI technologies. This is expected behaviour. I agree, however, that was should have generation scripts.