plevritis-lab / CELESTA

Automate unsupervised machine learning cell type identification using both protein expressions and cell spatial neighborhood information for multiplexed in situ imaging data. No training dataset with cell type labels is required.
Apache License 2.0
29 stars 10 forks source link

Error in AssignCells function #18

Open asasama12 opened 1 year ago

asasama12 commented 1 year ago

Hello, I'm trying out Celesta on my own dataset, and getting an error as follows on the AssignCells step of the protocol

[1] "Start calculating the scoring function." Error in apply(activation_prob_to_use[, non_NA_index], 1, function(x) (1 - : dim(X) must have a positive length

Any idea what may be causing this?

josenimo commented 1 year ago

Hi I am also having the same error, any help?

[1] "Proportion of cells changed assignment in the last iteration: 0.000427813587359535"
[1] "Total cells to be assigned in the current round: 1904"
[1] "Start calculating the scoring function."
Error in apply(current_scoring_matrix[unassigned_cells, cell_type_num],  : 
  dim(X) must have a positive length
In addition: Warning messages:
1: In current_cell_type_assignment[, previous_level_round] == previous_level_type :
  longer object length is not a multiple of shorter object length
2: In current_cell_type_assignment[, previous_level_round] == previous_level_type :
  longer object length is not a multiple of shorter object length
3: In current_cell_type_assignment[, (round - 1)] == previous_level_type :
  longer object length is not a multiple of shorter object length
4: In current_cell_type_assignment[, previous_level_round] == previous_level_type :
  longer object length is not a multiple of shorter object length
5: In current_cell_type_assignment[, (round - 1)] == previous_level_type :
  longer object length is not a multiple of shorter object length
6: In current_cell_type_assignment[, previous_level_round] == previous_level_type :
  longer object length is not a multiple of shorter object length
7: In current_cell_type_assignment[, (round - 1)] == previous_level_type :
  longer object length is not a multiple of shorter object length

files can be downloaded here: https://filetransfer.mdc-berlin.de/?u=DxZaa8wM&p=UTUztNAv

weiruo16 commented 1 year ago

Hi, the issue is with the input signature matrix which was not properly set up. In the Lineage_level column, the first number is a round indicating the order for CELESTA to identify the subtypes from "ONE" main cell type identified in the previous round with level up of lineage. For example, in the first round, you identified tumor cells and immune cells, in the second round, you can either identify subtypes of tumor cells or subtypes of immune cells, but not both. If you choose to identify subtypes of immune cells in the second round, you need to identify subtypes of tumor cells in the third round.

I have also provided a suggested cell type signature matrix and emailed it to you. I ran it using the default parameters, and I could run it without any error. Hope it can be helpful to clarify the set up.

weiruo16 commented 1 year ago

Hello, I'm trying out Celesta on my own dataset, and getting an error as follows on the AssignCells step of the protocol

[1] "Start calculating the scoring function." Error in apply(activation_prob_to_use[, non_NA_index], 1, function(x) (1 - : dim(X) must have a positive length

Any idea what may be causing this?

If you can provide your two inputs as Jose, it would be helpful for troubleshooting.

weiruo16 commented 1 year ago

Hi I am also having the same error, any help?

[1] "Proportion of cells changed assignment in the last iteration: 0.000427813587359535"
[1] "Total cells to be assigned in the current round: 1904"
[1] "Start calculating the scoring function."
Error in apply(current_scoring_matrix[unassigned_cells, cell_type_num],  : 
  dim(X) must have a positive length
In addition: Warning messages:
1: In current_cell_type_assignment[, previous_level_round] == previous_level_type :
  longer object length is not a multiple of shorter object length
2: In current_cell_type_assignment[, previous_level_round] == previous_level_type :
  longer object length is not a multiple of shorter object length
3: In current_cell_type_assignment[, (round - 1)] == previous_level_type :
  longer object length is not a multiple of shorter object length
4: In current_cell_type_assignment[, previous_level_round] == previous_level_type :
  longer object length is not a multiple of shorter object length
5: In current_cell_type_assignment[, (round - 1)] == previous_level_type :
  longer object length is not a multiple of shorter object length
6: In current_cell_type_assignment[, previous_level_round] == previous_level_type :
  longer object length is not a multiple of shorter object length
7: In current_cell_type_assignment[, (round - 1)] == previous_level_type :
  longer object length is not a multiple of shorter object length

files can be downloaded here: https://filetransfer.mdc-berlin.de/?u=DxZaa8wM&p=UTUztNAv

Hi, the issue is with the input signature matrix which was not properly set up. In the Lineage_level column, the first number is a round indicating the order for CELESTA to identify the subtypes from "ONE" main cell type identified in the previous round with level up of lineage. For example, in the first round, you identified tumor cells and immune cells, in the second round, you can either identify subtypes of tumor cells or subtypes of immune cells, but not both. If you choose to identify subtypes of immune cells in the second round, you need to identify subtypes of tumor cells in the third round.

I have also provided a suggested cell type signature matrix and emailed it to you. I ran it using the default parameters, and I could run it without any error. Hope it can be helpful to clarify the set up.

asasama12 commented 1 year ago

Hi, I believe I have the lineage levels set up so that I'm only identifying subtypes from one main cell types per row: image

I'm still encountering the same error message: image

weiruo16 commented 1 year ago

Hi, I believe I have the lineage levels set up so that I'm only identifying subtypes from one main cell types per row: image

I'm still encountering the same error message: image

For each round (the first number under the Lineage_level), CELESTA needs at least two cell types/subtypes, because the assignment is based on the higher probabilities. In this case, for round 3, there is only Tumor (3_2_6) cell, and that's why it caused the error. I would suggest putting Tumor cells in round 1, for example, on the round 1, having immune, tumor and non-immune/tumor cells. Or you only need immune and tumor on the first round, the rest of the cells will be assigned to unknowns.

asasama12 commented 1 year ago

I made the update you suggested and the same error message remains

image

weiruo16 commented 1 year ago

I made the update you suggested and the same error message remains

image

Then it must be something else causing the error which is not obvious to see from only the lineage information. Can you please share the whole inputs if possible, including the full signature matrix and imaging data. Also it would be help to know at which round the error occurred.

asasama12 commented 1 year ago

Sure! I included the matrix, the data file, and the r-script

https://drive.google.com/file/d/16sq05C0TMpUbJXXNpb2PH4v2wg5FYO8d/view?usp=sharing

weiruo16 commented 1 year ago

Sure! I included the matrix, the data file, and the r-script

https://drive.google.com/file/d/16sq05C0TMpUbJXXNpb2PH4v2wg5FYO8d/view?usp=sharing

matrix_for_reference.csv

There are several issues with your data. (1) It seems that your data have been transformed, even the X and Y coordinates, which is not right. Please double check your outputs from segmentation. (2)The default transformation in CELESTA is ArcSinh, but only on expressions, not the X and Y columns. If you want to transform your own data, you need to set transform_type = 0 in the CreateCelestaObject( ) function. But still, the X and Y columns should not have any transformation. Please plot the cells using the X and Y coordinates to double check them. (3) Your cell-type signature matrix has too many NAs. Some of the NAs are not right. For example, in your immune cells, you only have "1" for CD45, and NA for everything. CELESTA does not assign cell types based on only one marker, but based a combination of markers. In your case, you should at least set PANCK for "0" for immune cells, because PANCK is a general tumor marker. The error you got was because CELESTA did not have enough information to calculate a probability only based on one marker. Please see my attached signature matrix for reference. I changed some of the NAs to 1 or 0, but I don't know which tissue your data it is, so there may be more that can be changed to provide more accurate information on the cell types.

josenimo commented 1 year ago

Dear @weiruo16,

I am very sorry for the late reply, I now understand the lineage writing, thank you for taking your time!

asasama12 commented 1 year ago

@weiruo16

I was able to take in your points and get the code working on this dataset, thank you! The X and Y coordinates ended up being correct, they were just large because they represented slide coordinates for a subset of cells from a single TMA core.

weiruo16 commented 1 year ago

@weiruo16

I was able to take in your points and get the code working on this dataset, thank you! The X and Y coordinates ended up being correct, they were just large because they represented slide coordinates for a subset of cells from a single TMA core.

That makes sense! Thanks for explaining it.