Normalization & Imputation error

Hi there,

I am running the SCASL demo to check if my installations are fine. However, I am facing an error with the normalization and imputation step.

(scasl) lab@server2:/media/SCASL_splicing/SCASL-main$ python main.py -y configs/srr_star_demo.yaml =============Preprocessing============= Loading site names: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 35.25it/s] Reading and processing junction files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 30.40it/s] =============Filtering============= reading file... done. executing repeat and initial threshold filter... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17851/17851 [00:14<00:00, 1261.41it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17818/17818 [00:13<00:00, 1273.85it/s] done executing sites quality filter by threshold the site histogram is saved at process_result/20241108114532/img/site_hist.png the descriptions of the non-NaN data of sites are shown below count 40297.000000 mean 7.669926 std 6.271053 min 3.000000 0% 3.000000 10% 4.000000 20% 4.000000 30% 4.000000 40% 4.000000 50% 5.000000 60% 6.000000 70% 8.000000 80% 11.000000 90% 17.000000 max 40.000000 dtype: float64 the site histogram is saved at process_result/20241108114532/img/site_hist.png the descriptions of the non-NaN data of sites are shown below count 40138.000000 mean 7.703000 std 6.293347 min 3.000000 0% 3.000000 10% 4.000000 20% 4.000000 30% 4.000000 40% 4.000000 50% 5.000000 60% 6.000000 70% 8.000000 80% 11.000000 90% 17.000000 max 40.000000 dtype: float64 done. remove the duplicated site starts and ends... done. executing sample quality filter... the sample histogram is saved at process_result/20241108114532/img/sample_hist.png the descriptions of the non-NaN data of sites are shown below count 40.000000 mean 14823.125000 std 17859.287926 min 1407.000000 0% 1407.000000 10% 3657.600000 20% 4658.000000 30% 5990.200000 40% 8024.400000 50% 9009.000000 60% 12487.600000 70% 13943.100000 80% 17898.800000 90% 22226.700000 max 73079.000000 dtype: float64 done. saving... done. =============Normalization & Imputation============= reading data from process_result/20241108114532/filtered_matrix... Traceback (most recent call last): File "/media/SCASL_splicing/SCASL-main/main.py", line 10, in scasl.fit() File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 95, in fit run_cluster(self.cfg) File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 66, in run_cluster df_final, mat = normalize(filter_path, cfg.impute.num_iteration, cfg.impute.knn) File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 92, in normalize dfs = norm_only(df_path, 'start') File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 40, in norm_only df_prob = to_prob(df, groupby=groupby) File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 22, in to_prob sums = sums.drop(columns=['start', 'end']) File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/frame.py", line 5581, in drop return super().drop( File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4788, in drop obj = obj._drop_axis(labels, axis, level=level, errors=errors) File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4830, in _drop_axis new_axis = axis.drop(labels, errors=errors) File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7070, in drop raise KeyError(f"{labels[mask].tolist()} not found in axis") KeyError: "['end'] not found in axis"

Any help would be useful.

Thanks.

xryanglab / SCASL

Normalization & Imputation error #11