Polligator commented 1 year ago

running the tutorial using mouse brain data downloaded from 10X website, got the following error:

pred_refined, target_clusters, c_m=tesla.annotation(img=img, ... binary=binary, ... sudo_adata=enhanced_exp_adata, ... genes=genes, ... resize_factor=resize_factor, ... num_required=1, ... target_size="small") Computing image band... Computing gene band... Running TESLA... Traceback (most recent call last): File "", line 1, in File "/home/miniconda3/envs/scvi/lib/python3.9/site-packages/TESLAforST-1.2.4-py3.9.egg/TESLA/annotation.py", line 74, in annotation AssertionError

jianhuupenn commented 1 year ago

The error comes from the following assert

assert np.max(img_band1)==1 and np.min(img_band1)==0 and np.max(gene_band1)==1 and np.min(gene_band1)==0

which checks the input genes and histology image are not all 0. To further solve the error, could you let me know:

Do you have the same error when running this step using the tutorial data?
Can you plot the generated super-resolution gene expression to make sure the gene expression enhancement step is done correctly?
If the expression enhancement step is correctly ran, we can use diagnosis function to see what;s going wrong:

def diagnosis(img, sudo_adata, genes, resize_factor, binary, res=50, num_required=1, pooling="min", rseed=100, tseed=100, nseed=100, nChannel=100, threshold=1000, radius=3, minLabels=30, train_refine=True, plot_intermedium=False, target_size="small", min_UMI=5):

-------------------------------Image band-------------------------------------------------

if target_size=="small":
    target_ratio=1/2
elif target_size=="large":
    target_ratio=1/3
else:
    print("target_size not valid, Please specify (small / large).Use default small")
    target_ratio=1/2
print("Computing image band...")
resize_width=int(img.shape[1]*resize_factor)
resize_height=int(img.shape[0]*resize_factor)
img_resized=(img * np.dstack([(binary!=0)]*3)).astype(np.uint8)
img_resized = cv2.resize(img_resized, (resize_width, resize_height), interpolation = cv2.INTER_AREA)
gray_resized = cv2.cvtColor(img_resized,cv2.COLOR_BGR2GRAY)
gray_resized=gray_resized.reshape(list(gray_resized.shape)+[1])
binary_resized=cv2.resize(binary, (resize_width, resize_height), interpolation = cv2.INTER_AREA)
img_band1=(gray_resized-np.min(gray_resized))/(np.max(gray_resized)-np.min(gray_resized))
#-------------------------------Gene band-------------------------------------------------#
print("Computing gene band...")
genes=list(set([i for i in genes if i in sudo_adata.var.index ]))
assert num_required<=len(genes)
tmp=sudo_adata.X[:,sudo_adata.var.index.isin(genes)]
tmp=(tmp-np.min(tmp, 0))/(np.max(tmp, 0)-np.min(tmp, 0))
tmp = np.partition(tmp, -num_required, axis=1)[:,-num_required:] #Select top num_required
if pooling=="mean":
    sudo_adata.obs["meta"]=np.mean(tmp, axis=1)
elif pooling=="min":
    sudo_adata.obs["meta"]=np.min(tmp, axis=1)
else:
    print("Error! Pooling logic not understood.")
gene_img=np.zeros(list(img.shape[0:2])+[1])
for _, row in sudo_adata.obs.iterrows():
    x=row["x"]
    y=row["y"]
    exp=row["meta"]
    gene_img[int(x-res/2):int(x+res/2), int(y-res/2):int(y+res/2),0]=exp
gene_img[binary==0]=0
gene_band1=gene_img[:,:,0]
#Filter on min UMI
gene_band1[gene_band1<=np.log(min_UMI+1)]=0
gene_band1=cv2.resize(gene_band1, (resize_width, resize_height), interpolation = cv2.INTER_AREA)
gene_band1=(gene_band1-np.min(gene_band1))/(np.max(gene_band1)-np.min(gene_band1))
gene_band1=gene_band1.reshape(list(gene_band1.shape)+[1])
return img_band1, gene_band1


You can then run

img_band1, gene_band1=diagnosis(#use the same input as the annotation() function)


And print img_band1, gene_band1.

Polligator commented 1 year ago

thanks for your quick response and the diagnosis code. it seems to me that the problem arises from your scaling step in the function: tmp=(tmp-np.min(tmp, 0))/(np.max(tmp, 0)-np.min(tmp, 0)) this step generates some nan values for some genes, which further causes the final gene_band to all become nan. I think the calculation theoretically should be equal to the sklearn MinMaxScaler, however, when I use MinMaxScaler to do the same, it does not generate nan: from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() scaler.fit(tmp) scaled_data = scaler.transform(tmp)

I'm not sure what exactly went wrong, but I think a fix is probably needed

Polligator commented 1 year ago

on second thought, it's probably because the denominator in your scaling becomes 0, which causes the nan values.

jianhuupenn commented 1 year ago

Thank you for the suggestion. If a gene has the same max and min expression values, it is likely that its expression is 0 across all spots. In this case, I would recommend dropping it from the gene list as a quick fix, since it does not provide any useful information.

I'll add a note in the tutorial.

jianhuupenn / TESLA

AssertionError #4

-------------------------------Image band-------------------------------------------------