I am running the SCASL demo to check if my installations are fine. However, I am facing an error with the normalization and imputation step.
(scasl) lab@server2:/media/SCASL_splicing/SCASL-main$ python main.py -y configs/srr_star_demo.yaml
=============Preprocessing=============
Loading site names: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 35.25it/s]
Reading and processing junction files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 30.40it/s]
=============Filtering=============
reading file...
done.
executing repeat and initial threshold filter...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17851/17851 [00:14<00:00, 1261.41it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17818/17818 [00:13<00:00, 1273.85it/s]
done
executing sites quality filter by threshold
the site histogram is saved at process_result/20241108114532/img/site_hist.png
the descriptions of the non-NaN data of sites are shown below
count 40297.000000
mean 7.669926
std 6.271053
min 3.000000
0% 3.000000
10% 4.000000
20% 4.000000
30% 4.000000
40% 4.000000
50% 5.000000
60% 6.000000
70% 8.000000
80% 11.000000
90% 17.000000
max 40.000000
dtype: float64
the site histogram is saved at process_result/20241108114532/img/site_hist.png
the descriptions of the non-NaN data of sites are shown below
count 40138.000000
mean 7.703000
std 6.293347
min 3.000000
0% 3.000000
10% 4.000000
20% 4.000000
30% 4.000000
40% 4.000000
50% 5.000000
60% 6.000000
70% 8.000000
80% 11.000000
90% 17.000000
max 40.000000
dtype: float64
done.
remove the duplicated site starts and ends...
done.
executing sample quality filter...
the sample histogram is saved at process_result/20241108114532/img/sample_hist.png
the descriptions of the non-NaN data of sites are shown below
count 40.000000
mean 14823.125000
std 17859.287926
min 1407.000000
0% 1407.000000
10% 3657.600000
20% 4658.000000
30% 5990.200000
40% 8024.400000
50% 9009.000000
60% 12487.600000
70% 13943.100000
80% 17898.800000
90% 22226.700000
max 73079.000000
dtype: float64
done.
saving...
done.
=============Normalization & Imputation=============
reading data from process_result/20241108114532/filtered_matrix...
Traceback (most recent call last):
File "/media/SCASL_splicing/SCASL-main/main.py", line 10, in
scasl.fit()
File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 95, in fit
run_cluster(self.cfg)
File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 66, in run_cluster
df_final, mat = normalize(filter_path, cfg.impute.num_iteration, cfg.impute.knn)
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 92, in normalize
dfs = norm_only(df_path, 'start')
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 40, in norm_only
df_prob = to_prob(df, groupby=groupby)
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 22, in to_prob
sums = sums.drop(columns=['start', 'end'])
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/frame.py", line 5581, in drop
return super().drop(
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4788, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4830, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7070, in drop
raise KeyError(f"{labels[mask].tolist()} not found in axis")
KeyError: "['end'] not found in axis"
Hi there,
I am running the SCASL demo to check if my installations are fine. However, I am facing an error with the normalization and imputation step.
(scasl) lab@server2:/media/SCASL_splicing/SCASL-main$ python main.py -y configs/srr_star_demo.yaml =============Preprocessing============= Loading site names: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 35.25it/s] Reading and processing junction files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 30.40it/s] =============Filtering============= reading file... done. executing repeat and initial threshold filter... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17851/17851 [00:14<00:00, 1261.41it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17818/17818 [00:13<00:00, 1273.85it/s] done executing sites quality filter by threshold the site histogram is saved at process_result/20241108114532/img/site_hist.png the descriptions of the non-NaN data of sites are shown below count 40297.000000 mean 7.669926 std 6.271053 min 3.000000 0% 3.000000 10% 4.000000 20% 4.000000 30% 4.000000 40% 4.000000 50% 5.000000 60% 6.000000 70% 8.000000 80% 11.000000 90% 17.000000 max 40.000000 dtype: float64 the site histogram is saved at process_result/20241108114532/img/site_hist.png the descriptions of the non-NaN data of sites are shown below count 40138.000000 mean 7.703000 std 6.293347 min 3.000000 0% 3.000000 10% 4.000000 20% 4.000000 30% 4.000000 40% 4.000000 50% 5.000000 60% 6.000000 70% 8.000000 80% 11.000000 90% 17.000000 max 40.000000 dtype: float64 done. remove the duplicated site starts and ends... done. executing sample quality filter... the sample histogram is saved at process_result/20241108114532/img/sample_hist.png the descriptions of the non-NaN data of sites are shown below count 40.000000 mean 14823.125000 std 17859.287926 min 1407.000000 0% 1407.000000 10% 3657.600000 20% 4658.000000 30% 5990.200000 40% 8024.400000 50% 9009.000000 60% 12487.600000 70% 13943.100000 80% 17898.800000 90% 22226.700000 max 73079.000000 dtype: float64 done. saving... done. =============Normalization & Imputation============= reading data from process_result/20241108114532/filtered_matrix... Traceback (most recent call last): File "/media/SCASL_splicing/SCASL-main/main.py", line 10, in
scasl.fit()
File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 95, in fit
run_cluster(self.cfg)
File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 66, in run_cluster
df_final, mat = normalize(filter_path, cfg.impute.num_iteration, cfg.impute.knn)
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 92, in normalize
dfs = norm_only(df_path, 'start')
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 40, in norm_only
df_prob = to_prob(df, groupby=groupby)
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 22, in to_prob
sums = sums.drop(columns=['start', 'end'])
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/frame.py", line 5581, in drop
return super().drop(
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4788, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4830, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7070, in drop
raise KeyError(f"{labels[mask].tolist()} not found in axis")
KeyError: "['end'] not found in axis"
Any help would be useful.
Thanks.