memo - Githubissues

osuossu8 commented 1 year ago

アイデア気付き為になった discussion や notebook など

osuossu8 commented 1 year ago

public 0.482 https://www.kaggle.com/code/atom1231/hubmap-mmdet-2-26-public-inference
train/valid split https://www.kaggle.com/code/benihime91/hubmap-2023-create-coco-annotations

model info

cascade_mask_rcnn_x101_64x4d_fpn_20e_coco

+------------+-------+--------------+-------+----------+-------+

| category | AP | category | AP | category | AP |

+------------+-------+--------------+-------+----------+-------+

| glomerulus | 0.548 | blood_vessel | 0.295 | unsure | 0.001 |

+------------+-------+--------------+-------+----------+-------+

2023-06-29 05:04:30,302 - mmdet - INFO - Epoch(val) [18][422]

segm_mAP: 0.2810, segm_mAP_50: 0.4160, segm_mAP_75: 0.2860, segm_mAP_s: 0.0900, segm_mAP_m: 0.1930,

segm_mAP_l: 0.3600, segm_mAP1000: 0.2810, segm_mAP_copypaste: 0.281 0.416 0.286 0.090 0.193 0.360

  sg = sg.astype(np.uint8)
  kernel = np.ones(shape=(3, 3), dtype=np.uint8)
  binary_mask = cv2.dilate(sg, kernel, 3)    
  binary_mask = binary_mask.astype(bool)

osuossu8 commented 1 year ago

ds1 のものと ds1 + ds2 + dilate のものをアンサンブルするときは ds1 のみのものの方に dilate しないようにする

osuossu8 commented 1 year ago

process が GPU を hold しているときの解決策 https://qiita.com/miyamotok0105/items/033b850a205f958808e9

osuossu8 commented 1 year ago

全パターンカテゴリ id 0 にしていた, ミスっていた ↓ カテゴリ id 変えないと https://www.kaggle.com/code/kaerunantoka/hubmap-convert-coco-format-ds1-5fold-2class

osuossu8 commented 1 year ago

TTA も ds2 (given) を train に混ぜたケースだと効くのかも

あと 90度回転とか photometric aug とかも

osuossu8 commented 1 year ago

アンサンブル → 1class + 3class, ds1 (+ pseudo ds2) のみ, ds1+ds2 (given) + pp など

osuossu8 commented 1 year ago

pp は有無でラスサブ使いたい

osuossu8 commented 1 year ago

データ置き場は train val 共通で一箇所あれば良い, json に正しく分割して参照が書いてあれば良い。

osuossu8 commented 1 year ago

逆に ds1 に ds2 (given) で学習したモデルで pseudo-label つける

osuossu8 commented 1 year ago

unsure なしの 2class dataset 作って 047 の setting で試す

0.474 (3class) 0.473 (2class)

osuossu8 commented 1 year ago

5fold にしない

これの split を 5seed avg する https://www.kaggle.com/code/benihime91/hubmap-2023-create-coco-annotations/notebook

osuossu8 commented 1 year ago

ds1 only の場合, 768 size で精度頭打ちになる ds1 + ds2 (given) の shuffle だと ??? 640 -> 768 で大幅 gain

640 : 044, 0.457 768 : 047, 0.474 840 : 049, 0.457 1280 : 048, 0.468

osuossu8 commented 1 year ago

ds2 wsi 1, 2 にのみ pseudo-label つけるそのあと ds3 に逐次つける

osuossu8 commented 1 year ago

mask2former の設定を deflault で回す (iterbased)

osuossu8 commented 1 year ago

mask2former swin-s を試す 768, 840, 968, 1024 までやってみてサイズの恩恵どこまで行くかみる

osuossu8 commented 1 year ago

rotate90 は効きそう? TTA 試す

https://www.kaggle.com/code/hidngnguyna/baseline-unet-semantic-as-instance-segmentation/notebook

osuossu8 commented 1 year ago

色が薄いやつがある 1,2,3,4 (given) 5 (private test) 6,7, 8, 11, 14 (色近い) 9, 10, 12, 13 (薄い)

https://www.kaggle.com/code/kaerunantoka/fork-of-investigate-tiles-on-wsi-1-and-2

osuossu8 commented 1 year ago

ds3 の検討 5fold 回してたら時間ないので 1fold ずつやる？

osuossu8 commented 1 year ago

data 分割

https://www.kaggle.com/competitions/hubmap-hacking-the-human-vasculature/discussion/413038#2279515

wsi 3, 4 の ds 2 は val data にしてはダメ？

osuossu8 commented 1 year ago

0.474 でる重みを load して数 epoch, pseudo-labeled の ds3 を追加したデータセットで追加 finetune する (鳥蛙コンペの solution とかにある)

load_from を fold ごとに変更して回す

osuossu8 commented 1 year ago

もう一度 ↓ の split https://www.kaggle.com/code/benihime91/hubmap-2023-create-coco-annotations かけてみるか

osuossu8 commented 1 year ago

和集合じゃなくて積集合でアンサンブルする？

osuossu8 commented 1 year ago

iou_th を 0.6 から色々変えてみる

osuossu8 commented 1 year ago

047 + rotate 90, -90

dict(type='Rotate', prob=1., min_mag=90.0, max_mag=90.0, reversal_prob=0.0),
dict(type='Rotate', prob=1., min_mag=90.0, max_mag=90.0, reversal_prob=1.0),

osuossu8 commented 1 year ago

random choice resize とか random resized crop とか

osuossu8 commented 1 year ago

4fold split とかにする

osuossu8 commented 1 year ago

1st stage 普通に学習 2nd stage pseudo label つけたデータ作成して, それのみで学習 3rd stage 2stage の重みを load して 1st stage の data で finetune

osuossu8 commented 1 year ago

↓を shuffle するのではなく, ↓ かつ train は ds2, val は ds1 でやる (single fold でいける)

ds1_wsi12 = df.query('dataset == 1 & source_wsi in [1, 2]')
ds2_wsi34 = df.query('dataset == 2 & source_wsi in [3, 4]')

osuossu8 commented 1 year ago

時間ないから single fold でできる検証をしたいところ

osuossu8 commented 1 year ago

047 を一回 1fold ずつ sub する

osuossu8 commented 1 year ago

random resize は hit した, random choice resize とかもやってみる

osuossu8 commented 1 year ago

5fold のアンサンブルには WBF を使う 5fold のうち各 fold のループ内で NMW を使う

osuossu8 commented 1 year ago

過去 segmentation comp の solution から学ぶ https://www.kaggle.com/code/markunys/8th-place-solution-inference

osuossu8 commented 1 year ago

fold ごとには nms するが 5fold の pred はまとめず (nms, wmf しないで) 全部を sub する

osuossu8 commented 1 year ago

コンペ終わるまでにやりたいこと

WMF の検討, 多分これをうまく活かせないと勝てない -> うまく行った
WMF がうまくいった場合 TTA (hori, verti, rot90 やる) -> うまく行っていない, うまくいかない？
pseudo-labeled データで pretrain
あるデータで fine-tune -> ちょっと良くなる
pseudo-label データを再度作る -> 2回転目から微妙, 1回目まで

osuossu8 commented 1 year ago

weighted mask fusion した後の mask は閾値で切らないといけない説 ? (binary ではなく float mask になってる?)

→ np.array(ens_masks, dtype=np.uint8) で type cast すると小数点以下切り捨てで 1.0 の mask しか残らなかったからスコア大下げしたのかも

bingo

↓ だと mask threshold 0.5 でやってる https://www.kaggle.com/code/markunys/8th-place-solution-inference

  array([[0.03812, 0.03812, 0.03812, ..., 0.     , 0.     , 0.     ],
         [0.03812, 0.03812, 0.03812, ..., 0.     , 0.     , 0.     ],
         [0.03812, 0.03812, 0.03812, ..., 0.     , 0.     , 0.     ],
         ...,
         [0.     , 0.     , 0.     , ..., 0.     , 0.     , 0.     ],
         [0.     , 0.     , 0.     , ..., 0.     , 0.     , 0.     ],
         [0.     , 0.     , 0.     , ..., 0.     , 0.     , 0.     ]],
...
  array([[0.8066, 0.8066, 1.    , ..., 0.    , 0.    , 0.    ],
         [0.8066, 0.8066, 1.    , ..., 0.    , 0.    , 0.    ],
         [1.    , 1.    , 1.    , ..., 0.    , 0.    , 0.    ],
         ...,
         [0.    , 0.    , 0.    , ..., 0.    , 0.    , 0.    ],
         [0.    , 0.    , 0.    , ..., 0.    , 0.    , 0.    ],
         [0.    , 0.    , 0.    , ..., 0.    , 0.    , 0.    ]],

osuossu8 commented 1 year ago

↓ WBF 修正できたので, TTA と合体させる (hflip, vflip, rot90) アンサンブルもする

osuossu8 commented 1 year ago

single model (5fold) までは nms TTA からは wmf

osuossu8 commented 1 year ago

TTA とかもバグり散らかしていた説...

→ やはりテスト書かないと信頼できないな...

osuossu8 commented 1 year ago

seed avg もやる

osuossu8 commented 1 year ago

https://www.kaggle.com/code/yukkyo/mmdetpkgs

→ 動いた, もう少し掘れそう

→ 全追加 + 20 epoch だと cv 0.34, lb 0.347 → もっと epoch 増やしたら?

使う aug 絞る (mosic は使わないとか)

osuossu8 commented 1 year ago

1024 でなく 768 でやってみる

osuossu8 commented 1 year ago

rot90 両方でなく, 片方だけやる?

osuossu8 commented 1 year ago

dilate を 2 とか 4 とか dilate や mask_th を配列で渡す (model ごとに変える)

osuossu8 commented 1 year ago

ens_masks = dilate_predict_mask(ens_masks) が意味をなしていない説 (dilate1=dilate2=dilate3=dilate4 になってる) (break して np.allclose して True だった)

-> debug した, np.allclose して False なった

#ens_masks = np.array(ens_masks, dtype=np.uint8)
#ens_masks = dilate_predict_mask(ens_masks)
#ens_masks = np.array(ens_masks, dtype=bool)

dtype=bool で変換すると同じになる?

osuossu8 commented 1 year ago

keroppi data とか掘れそう

osuossu8 commented 1 year ago

itk-san 手法より TPU の方が掘れそう (計算資源的に)

osuossu8 commented 1 year ago

max_keep_ckpts=1 にしたらもっと upload できる (snapshot ens しないし)

osuossu8 commented 1 year ago

アンサンブルのタネそれぞれの説明とスコア書いて多様性実験の整理にもなる

dataset の type は 3つか

ds1_wsi12 = df.query('dataset == 1 & source_wsi in [1, 2]')
ds2_wsi34 = df.query('dataset == 2 & source_wsi in [3, 4]')
df_use = pd.concat([ds1_wsi12, ds2_wsi34]).reset_index(drop=True)

train_sources = df.query('dataset == 2')
val_sources = df.query('dataset == 1')

shuffle

osuossu8 commented 1 year ago

solo
- https://qiita.com/Kmat67916008/items/474da9f1f5553579cf76
- https://github.com/open-mmlab/mmdetection/tree/3.x/configs/solo
queryinst
- https://github.com/open-mmlab/mmdetection/tree/3.x/configs/queryinst
https://twitter.com/ZFPhalanx/status/1457946102166999046?s=20
model mask r-cnn -> cascade/htc -> yolact -> condinst -> solo -> queryinst
mask refine refinemask, mask scoring
augmentation copypaste

選択肢としては

end-to-end framework(mask r-cnn等)
unet -> watershed
detection -> crop -> semantic segmentation +lightgbmという認識
Unet -> watershed / connect component

osuossu8 / kaggle_hubmap_2023

memo #30

コンペ終わるまでにやりたいこと