jianhuupenn / TESLA

Deciphering tumor ecosystems at super-resolution from spatial transcriptomics with TESLA
MIT License
38 stars 10 forks source link

AssertionError #4

Closed Polligator closed 1 year ago

Polligator commented 1 year ago

running the tutorial using mouse brain data downloaded from 10X website, got the following error:

pred_refined, target_clusters, c_m=tesla.annotation(img=img, ... binary=binary, ... sudo_adata=enhanced_exp_adata, ... genes=genes, ... resize_factor=resize_factor, ... num_required=1, ... target_size="small") Computing image band... Computing gene band... Running TESLA... Traceback (most recent call last): File "", line 1, in File "/home/miniconda3/envs/scvi/lib/python3.9/site-packages/TESLAforST-1.2.4-py3.9.egg/TESLA/annotation.py", line 74, in annotation AssertionError

jianhuupenn commented 1 year ago

The error comes from the following assert

assert np.max(img_band1)==1 and np.min(img_band1)==0 and np.max(gene_band1)==1 and np.min(gene_band1)==0

which checks the input genes and histology image are not all 0. To further solve the error, could you let me know:

  1. Do you have the same error when running this step using the tutorial data?
  2. Can you plot the generated super-resolution gene expression to make sure the gene expression enhancement step is done correctly?
  3. If the expression enhancement step is correctly ran, we can use diagnosis function to see what;s going wrong:

def diagnosis(img, sudo_adata, genes, resize_factor, binary, res=50, num_required=1, pooling="min", rseed=100, tseed=100, nseed=100, nChannel=100, threshold=1000, radius=3, minLabels=30, train_refine=True, plot_intermedium=False, target_size="small", min_UMI=5):

-------------------------------Image band-------------------------------------------------

if target_size=="small":
    target_ratio=1/2
elif target_size=="large":
    target_ratio=1/3
else:
    print("target_size not valid, Please specify (small / large).Use default small")
    target_ratio=1/2
print("Computing image band...")
resize_width=int(img.shape[1]*resize_factor)
resize_height=int(img.shape[0]*resize_factor)
img_resized=(img * np.dstack([(binary!=0)]*3)).astype(np.uint8)
img_resized = cv2.resize(img_resized, (resize_width, resize_height), interpolation = cv2.INTER_AREA)
gray_resized = cv2.cvtColor(img_resized,cv2.COLOR_BGR2GRAY)
gray_resized=gray_resized.reshape(list(gray_resized.shape)+[1])
binary_resized=cv2.resize(binary, (resize_width, resize_height), interpolation = cv2.INTER_AREA)
img_band1=(gray_resized-np.min(gray_resized))/(np.max(gray_resized)-np.min(gray_resized))
#-------------------------------Gene band-------------------------------------------------#
print("Computing gene band...")
genes=list(set([i for i in genes if i in sudo_adata.var.index ]))
assert num_required<=len(genes)
tmp=sudo_adata.X[:,sudo_adata.var.index.isin(genes)]
tmp=(tmp-np.min(tmp, 0))/(np.max(tmp, 0)-np.min(tmp, 0))
tmp = np.partition(tmp, -num_required, axis=1)[:,-num_required:] #Select top num_required
if pooling=="mean":
    sudo_adata.obs["meta"]=np.mean(tmp, axis=1)
elif pooling=="min":
    sudo_adata.obs["meta"]=np.min(tmp, axis=1)
else:
    print("Error! Pooling logic not understood.")
gene_img=np.zeros(list(img.shape[0:2])+[1])
for _, row in sudo_adata.obs.iterrows():
    x=row["x"]
    y=row["y"]
    exp=row["meta"]
    gene_img[int(x-res/2):int(x+res/2), int(y-res/2):int(y+res/2),0]=exp
gene_img[binary==0]=0
gene_band1=gene_img[:,:,0]
#Filter on min UMI
gene_band1[gene_band1<=np.log(min_UMI+1)]=0
gene_band1=cv2.resize(gene_band1, (resize_width, resize_height), interpolation = cv2.INTER_AREA)
gene_band1=(gene_band1-np.min(gene_band1))/(np.max(gene_band1)-np.min(gene_band1))
gene_band1=gene_band1.reshape(list(gene_band1.shape)+[1])
return img_band1, gene_band1

You can then run

img_band1, gene_band1=diagnosis(#use the same input as the annotation() function)


And print img_band1, gene_band1.
Polligator commented 1 year ago

thanks for your quick response and the diagnosis code. it seems to me that the problem arises from your scaling step in the function: tmp=(tmp-np.min(tmp, 0))/(np.max(tmp, 0)-np.min(tmp, 0)) this step generates some nan values for some genes, which further causes the final gene_band to all become nan. I think the calculation theoretically should be equal to the sklearn MinMaxScaler, however, when I use MinMaxScaler to do the same, it does not generate nan: from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() scaler.fit(tmp) scaled_data = scaler.transform(tmp)

I'm not sure what exactly went wrong, but I think a fix is probably needed

Polligator commented 1 year ago

on second thought, it's probably because the denominator in your scaling becomes 0, which causes the nan values.

jianhuupenn commented 1 year ago

Thank you for the suggestion. If a gene has the same max and min expression values, it is likely that its expression is 0 across all spots. In this case, I would recommend dropping it from the gene list as a quick fix, since it does not provide any useful information.

I'll add a note in the tutorial.