Closed Laolga closed 1 year ago
Please share some code that you used to create the ref, and show some entries of the resulting ref matrix - I suspect the scaling is wrong.
> count_mat_ref = read.csv("count_mat.csv", row.names = 1)
> head(count_mat_ref[seq(1,5)])
RP11.206L10.2 RP11.54O7.16 HES4 ISG15 AGRN
AAACCTGCACGTTGGC-1-0 0 0 0 0 0
AAACCTGGTGCACTTA-1-0 0 0 0 0 0
AAACCTGTCAATACCG-1-0 0 0 0 0 0
AAACCTGTCGCGTAGC-1-0 0 0 0 0 0
AAACGGGAGGTCATCT-1-0 0 0 0 0 0
AAACGGGCAAAGGAAG-1-0 0 0 0 0 0
> cell2type = read.csv("celltypes.csv")
> colnames(cell2type) = c("cell", "group")
> head(cell2type)
cell group
1 AAACCTGCACGTTGGC-1-0 Malignant Basal State
2 AAACCTGGTGCACTTA-1-0 Malignant Basal State
3 AAACCTGTCAATACCG-1-0 Malignant Basal State
4 AAACCTGTCGCGTAGC-1-0 Malignant Basal State
5 AAACGGGAGGTCATCT-1-0 Malignant Basal State
6 AAACGGGCAAAGGAAG-1-0 Malignant Basal State
> ref_internal = aggregate_counts(t(count_mat_ref), cell2type)
cell_dict
Astrocytes Granule Cells Malignant Basal State
36 162 3543
Malignant Cycling Malignant Granule-like Progenitor Malignant Neuronal Development I
555 181 1657
Malignant Neuronal Development II Malignant Neuronal Development III Microglia
871 171 50
> head(ref_internal)
Astrocytes Granule Cells Malignant Basal State Malignant Cycling
RP11.206L10.2 0.000000000 3.151704e-05 4.925321e-04 2.944128e-05
RP11.54O7.16 0.000000000 0.000000e+00 1.038808e-03 2.566988e-05
HES4 0.005400502 2.767025e-04 2.605442e-05 1.749445e-03
ISG15 0.000000000 1.859600e-03 6.491731e-04 5.720368e-04
AGRN 0.000000000 0.000000e+00 7.412861e-04 5.547613e-05
TNFRSF18 0.000000000 3.787758e-04 1.154173e-03 0.000000e+00
Malignant Granule-like Progenitor Malignant Neuronal Development I
RP11.206L10.2 1.607266e-06 3.168859e-04
RP11.54O7.16 0.000000e+00 2.813663e-05
HES4 6.216030e-03 4.306779e-06
ISG15 3.068418e-06 1.141342e-03
AGRN 9.146807e-05 4.081686e-04
TNFRSF18 0.000000e+00 4.138260e-04
Malignant Neuronal Development II Malignant Neuronal Development III Microglia
RP11.206L10.2 0.0001436701 1.209148e-05 0.000000e+00
RP11.54O7.16 0.0006976547 3.036306e-04 0.000000e+00
HES4 0.0006754232 2.060926e-04 6.577098e-07
ISG15 0.0009031457 1.155408e-05 0.000000e+00
AGRN 0.0019187173 1.808349e-04 1.873048e-03
TNFRSF18 0.0002515563 1.545023e-05 0.000000e+00
Hi @Laolga ,
Thanks. First, you can probably try log(lambdas_ref * 1e+06 + 1)
outside of the program to see where NaN is produced. Second, tumor cell types should not be included in diploid reference.
Hi! I was using numbat before with provided reference data and it worked fairly well. Then I wanted to try with a more relevant reference data and used provided
aggregate_counts
function for that. It ran without any error and provided gene x cell types matrix without any NANs.But numbat is crashing when I use that reference:
Could you please suggest what might be wrong?
Olga