Closed FuryMartin closed 1 month ago
The current implemetation in Ianvs is: https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L142-L173
This is a revised version of the implementation and all interfaces used are compatible with pandas==1.1.5
def _get_all(self, test_cases, test_results) -> pd.DataFrame:
all_df = pd.DataFrame(columns=self.all_df_header)
for i, test_case in enumerate(test_cases):
algorithm = test_case.algorithm
test_result = test_results[test_case.id][0]
# add algorithm, paradigm, time, url of algorithm
row_data = {
"algorithm": algorithm.name,
"paradigm": algorithm.paradigm_type,
"time": test_results[test_case.id][1],
"url": test_case.output_dir
}
# add metric of algorithm
row_data.update(test_result)
# add module of algorithm
row_data.update({
module_type: module.name
for module_type, module in algorithm.modules.items()
})
# add hyperparameters of algorithm modules
row_data.update(self._get_algorithm_hyperparameters(algorithm))
# fill data
all_df.loc[i] = row_data
new_df = self._concat_existing_data(all_df)
return self._sort_all_df(new_df, self._get_all_metric_names(test_results))
def _concat_existing_data(self, new_df):
if utils.is_local_file(self.all_rank_file):
old_df = pd.read_csv(self.all_rank_file, index_col=0)
new_df = pd.concat([old_df, new_df])
return new_df
Comparing to the current implementation, the revised one mainly:
row_data
.pd.concat
to merge old_df and new_df.__get_all()
clearer.Additionally, I removed sep=" "
from all CSV read and write functions.
After fixing PCB-AoI's dependencies, I conducted experiments on this example with Python==3.6.13
to prove that our revisions do not introduce new compatibility issues.
The experimental results are as follows:
Run once:
f1_score_avg: 0.8568
+------+-------------------------+----------+--------------------+-----------+--------------------+-------------------------+---------------------+------------------------------------------------------------------------------------------+
| rank | algorithm | f1_score | paradigm | basemodel | basemodel-momentum | basemodel-learning_rate | time | url |
+------+-------------------------+----------+--------------------+-----------+--------------------+-------------------------+---------------------+------------------------------------------------------------------------------------------+
| 1 | fpn_singletask_learning | 0.8694 | singletasklearning | FPN | 0.95 | 0.1 | 2024-08-16 22:54:24 | ./workspace/benchmarkingjob/fpn_singletask_learning/fdd65c94-5bde-11ef-bf9b-755996a48c84 |
| 2 | fpn_singletask_learning | 0.8568 | singletasklearning | FPN | 0.5 | 0.1 | 2024-08-16 22:57:38 | ./workspace/benchmarkingjob/fpn_singletask_learning/fdd65c95-5bde-11ef-bf9b-755996a48c84 |
+------+-------------------------+----------+--------------------+-----------+--------------------+-------------------------+---------------------+------------------------------------------------------------------------------------------+
Run Twice:
f1_score_avg: 0.8635
+------+-------------------------+----------+--------------------+-----------+--------------------+-------------------------+---------------------+------------------------------------------------------------------------------------------+
| rank | algorithm | f1_score | paradigm | basemodel | basemodel-momentum | basemodel-learning_rate | time | url |
+------+-------------------------+----------+--------------------+-----------+--------------------+-------------------------+---------------------+------------------------------------------------------------------------------------------+
| 1 | fpn_singletask_learning | 0.8707 | singletasklearning | FPN | 0.95 | 0.1 | 2024-08-16 23:59:15 | ./workspace/benchmarkingjob/fpn_singletask_learning/08e9a128-5be8-11ef-bf9b-755996a48c84 |
| 2 | fpn_singletask_learning | 0.8694 | singletasklearning | FPN | 0.95 | 0.1 | 2024-08-16 22:54:24 | ./workspace/benchmarkingjob/fpn_singletask_learning/fdd65c94-5bde-11ef-bf9b-755996a48c84 |
| 3 | fpn_singletask_learning | 0.8635 | singletasklearning | FPN | 0.5 | 0.1 | 2024-08-17 00:02:22 | ./workspace/benchmarkingjob/fpn_singletask_learning/08e9a129-5be8-11ef-bf9b-755996a48c84 |
| 4 | fpn_singletask_learning | 0.8568 | singletasklearning | FPN | 0.5 | 0.1 | 2024-08-16 22:57:38 | ./workspace/benchmarkingjob/fpn_singletask_learning/fdd65c95-5bde-11ef-bf9b-755996a48c84 |
+------+-------------------------+----------+--------------------+-----------+--------------------+-------------------------+---------------------+------------------------------------------------------------------------------------------+
The all_rank.csv
I get shows as bellow:
rank,algorithm,f1_score,paradigm,basemodel,basemodel-momentum,basemodel-learning_rate,time,url
1,fpn_singletask_learning,0.8707,singletasklearning,FPN,0.95,0.1,2024-08-16 23:59:15,./workspace/benchmarkingjob/fpn_singletask_learning/08e9a128-5be8-11ef-bf9b-755996a48c84
2,fpn_singletask_learning,0.8694,singletasklearning,FPN,0.95,0.1,2024-08-16 22:54:24,./workspace/benchmarkingjob/fpn_singletask_learning/fdd65c94-5bde-11ef-bf9b-755996a48c84
3,fpn_singletask_learning,0.8635,singletasklearning,FPN,0.5,0.1,2024-08-17 00:02:22,./workspace/benchmarkingjob/fpn_singletask_learning/08e9a129-5be8-11ef-bf9b-755996a48c84
4,fpn_singletask_learning,0.8568,singletasklearning,FPN,0.5,0.1,2024-08-16 22:57:38,./workspace/benchmarkingjob/fpn_singletask_learning/fdd65c95-5bde-11ef-bf9b-755996a48c84
These results indicate that the revised version is functioning properly.
The overall change looks good to me. Please talk about it at the next community meeting and see if anyone else has any questions.
The overall change looks good to me. Please talk about it at the next community meeting and see if anyone else has any questions.
OK, thanks for the review.
What would you like to be added/modified:
Some outdated interfaces of
pandas
are used in inrank.py
.We can deprecate them by:
__get_all()
using a more elegant interface.pd.concat()
to merge old DataFrame and new DataFrame.The modified method should remain compatible with the version of pandas corresponding to Python 3.6.
Why is this needed:
Ianvs was originally designed for
pandas==1.1.5
, but the latest version is nowpandas==2.2.2
.Due to a major version update, some interfaces of pandas have been deprecated in the new version.Continuing to use these old interfaces will encounter errors on
Python>=3.8
.pd.np
has been deprecated inpandas>=2.0.0
: https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L208append
has been deprecated inpandas>=2.0.0
: https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L171initializing
all_df
withnp.NAN
will causestr
data missing: https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L145Setting value by
df.[row_index][column]
will causeSettingWithCopyWarning
. This interface will be deprecated inpandas>=3.0
in the future. https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L148Line 151, 154, 158, 165, 167 have the same issue, too.
Assigning values one by one reduces code readability and can be simplified. https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L145-L167
Using whitspace as seperator in a CSV(Comma-Separated Values) file is weird. https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L179