Problem when reproducing the classification performance

Amy-Liao commented 2 years ago

Hi, sorry for bothering. I used validation set as test dataset by cd ../prepare && python3 fake_testing_set.py and tried to reproduce the paper evaluation results by running !cd ../judge && python3 classification_perf.py inception_v4 However, I got this

Performance for 店、路、车 are both 0.0% And in the "cls_precision_by_model_size" file, I got this The accuracy are all about 0.2

I also have read this issue https://github.com/yuantailing/ctw-baseline/issues/29#issuecomment-445824509 , but I did run your classsification/decide_cates.py without modification to generate cates.json. Ｗhat reasons might it be for the performance I got? Thanks for your help :)

yuantailing commented 2 years ago

To use the pre-trained model, we should use the real train.jsonl and eval.jsonl, not the fake data generated by fake_testing_set.py .

You can run cp ../data/annotations/downloads/*.jsonl ../data/annotations/ and then python3 decide_cates.py. As a result, the top-10 categories in cates.json are

  {
    "cate_id": 0,
    "text": "中",
    "trainval": 13924
  },
  {
    "cate_id": 1,
    "text": "国",
    "trainval": 9410
  },
  {
    "cate_id": 2,
    "text": "大",
    "trainval": 8843
  },
  {
    "cate_id": 3,
    "text": "电",
    "trainval": 6908
  },
  {
    "cate_id": 4,
    "text": "店",
    "trainval": 6622
  },
  {
    "cate_id": 5,
    "text": "路",
    "trainval": 6555
  },
  {
    "cate_id": 6,
    "text": "车",
    "trainval": 6541
  },
  {
    "cate_id": 7,
    "text": "家",
    "trainval": 6200
  },
  {
    "cate_id": 8,
    "text": "公",
    "trainval": 5946
  },
  {
    "cate_id": 9,
    "text": "行",
    "trainval": 5672
  },

Amy-Liao commented 2 years ago

Thank you so much! I've successfully reproduced the performance :)

yuantailing / ctw-baseline

Problem when reproducing the classification performance #44