Open aretaon opened 2 months ago
Apperently the issue is differences in column naming between the part where a dataframe is written and a few lines below where the same column is used for sorting, e.g.
- noNoiseIndex = df.index[df["Cluster Labels"] > 0]
+ noNoiseIndex = df.index[df["Cluster Labels({})".format(analysisName)] > 0]
fixing this a few times (see complete diff below) also fixes the error. If this is the indended behaviour please feel free to patch with the diff.
diff --git a/src/main.py b/src/main.py
index e7be847..33cbf41 100644
--- a/src/main.py
+++ b/src/main.py
@@ -1670,7 +1670,7 @@ class ComplexFinder(object):
generateSquareMatrix = True,
)
df = pd.DataFrame().from_dict({"Entry":intLabels,"Cluster Labels({})".format(analysisName):clusterLabels,"reachability":reachability,"core_distances":core_distances})
- df = df.sort_values(by="Cluster Labels")
+ df = df.sort_values(by="Cluster Labels({})".format(analysisName))
df = df.set_index("Entry")
if pooledDistances is not None:
@@ -1679,7 +1679,7 @@ class ComplexFinder(object):
squaredDf = pd.DataFrame(matrix,columns=intLabels,index=intLabels).loc[df.index,df.index]
squaredDf.to_csv(os.path.join(pathToFolder,"SquaredSorted_{}.txt".format(self.currentAnalysisName)),sep="\t")
- noNoiseIndex = df.index[df["Cluster Labels"] > 0]
+ noNoiseIndex = df.index[df["Cluster Labels({})".format(analysisName)] > 0]
squaredDf.loc[noNoiseIndex,noNoiseIndex].to_csv(os.path.join(pathToFolder,"NoNoiseSquaredSorted_{}.txt".format(self.currentAnalysisName)),sep="\t")
splitLabels = True
@@ -1691,7 +1691,7 @@ class ComplexFinder(object):
dfEmbed["clusterLabels({})".format(analysisName)] = clusterLabels
dfEmbed["labels({})".format(analysisName)] = intLabels
if splitLabels:
- dfEmbed["sLabels"] = dfEmbed["labels"].str.split("_",expand=True).values[:,0]
+ dfEmbed["sLabels"] = dfEmbed["labels({})".format(analysisName)].str.split("_",expand=True).values[:,0]
dfEmbed = dfEmbed.set_index("sLabels")
else:
dfEmbed = dfEmbed.set_index("labels({})".format(analysisName))
Thank you very much and please excuse the late reply. I will double-check tomorrow afternoon and then accept/edit. Thanks again! Cheers Hendrik
Error description
ComplexFinder does not run in no-database mode.
How to reproduce
Starting from the example files provided the following code works:
returns
However, running the same code with
noDatabaseForPredictions=False
leads to no errors.System properties
OS: Fedora 40 Python: 3.8 Dependencies: jupyter = ">=1.1.1,<2" asteval = "<=0.9.19" certifi = "<=2022.12.7" cycler = "<=0.10.0" cython = "<=0.29.21" future = "<=0.18.2" hdbscan = "<=0.8.29" joblib = "<=1.2.0" kiwisolver = "<=1.3.1" llvmlite = "<=0.34.0" lmfit = "<=1.0.1" matplotlib = "<=3.3.2" numba = "<=0.51.2" numpy = "<=1.22.0" pandas = "<=1.1.4" pillow = "<=9.3.0" pyparsing = "<=2.4.7" python-dateutil = "<=2.8.1" pytz = "<=2020.4" scikit-learn = "<=0.23.2" scipy = "<=1.5.4" seaborn = "<=0.11.0" six = "<=1.15.0" threadpoolctl = "<=2.1.0" umap-learn = ">=0.5.0" uncertainties = "<=3.1.4" imbalanced-learn = ">=0.7.0,<0.8"
Full traceback: