Open PSSUN opened 2 days ago
I changed part of the code to remove the part used to calculate the ari, and now I can get the result. If you have similar questions, you can refer to the following code:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn import metrics
from sklearn.metrics import pairwise_distances
import h5py
### refine clustering labels by the majority of neighbors
def refine(sample_id, pred, dis, shape="square"):
refined_pred=[]
pred=pd.DataFrame({"pred": pred}, index=sample_id)
dis_df=pd.DataFrame(dis, index=sample_id, columns=sample_id)
if shape=="hexagon":
num_nbs=6
elif shape=="square":
num_nbs=4
else:
print("Shape not recongized, shape='hexagon' for Visium data, 'square' for ST data.")
for i in range(len(sample_id)):
index=sample_id[i]
dis_tmp=dis_df.loc[index, :].sort_values()
nbs=dis_tmp.iloc[0:(num_nbs+1)]
nbs_pred=pred.loc[nbs.index, "pred"]
self_pred=pred.loc[index, "pred"]
v_c=nbs_pred.value_counts()
if (v_c.loc[self_pred]<num_nbs/2) and (np.max(v_c)>num_nbs/2):
refined_pred.append(v_c.idxmax())
else:
refined_pred.append(self_pred)
if (i+1) % 1000 == 0:
print("Processed", i+1, "lines")
return np.array(refined_pred)
import numpy as np
data_mat = h5py.File('output_file.h5', 'r')
data_mat.close()
final_latent = np.loadtxt("./final_latent.txt", delimiter=",")
pred = KMeans(n_clusters=7, n_init=100).fit_predict(final_latent)
np.savetxt("clustering_labels.txt", pred, delimiter=",", fmt="%i")
dis = pairwise_distances(pos, metric="euclidean", n_jobs=-1).astype(np.double)
pred_refined = refine(np.arange(pred.shape[0]), pred, dis, shape="hexagon")
np.savetxt("refined_clustering_labels.txt", pred_refined, delimiter=",", fmt="%i")
I run spaVAE by command line:
python run_spaVAE.py --data_file output_file.h5 --device cpu --inducing_point_steps 6
and I got 2 files: denoised_counts.txt and final_latent.txt
The documentation doesn't specify how to handle the command-line output. How can I view my results? I checked the code in the ipynb file provided in the tutorial, but it doesn't match the command-line output. For example, part of the code in the tutorial for DLPFC151673 is as follows:
y = np.array(data_mat['Y']).astype('U26') # ground-truth labels
The #ground-truth labels line requires labels to exist in the original data source, but for unanalyzed data, there are no natural labels. How do I see my results in this case?
How can I deal with denoised_counts.txt and final_latent.txt?