lancercat / VSDF

24 stars 5 forks source link

Why to use the DSBN Network? #9

Closed bad-meets-joke closed 1 year ago

bad-meets-joke commented 1 year ago

Hi,

Thank you for releasing the code.

I wonder the reason why to use DSBN in the backbone. Is it used to deal with the huge domain gap between the scene character samples and glyphs? Is it a key factor to improve the performance in zero-shot character recognition on CTW dataset?

Looking foward to your explaination. Thanks.

lancercat commented 1 year ago

Simply put, you are actually correct on both assumptions.

We actually tried the Siamese network in very early versions and found it bites quite hard on the closest performances due to the domain gap...

And we also found sharing the filters but not the norms is better than using two different networks, which may alleviate the overfitting risk caused by the small label set(we only have ~3800 training glyphs after all).

I think this is the reason why the results are better than OSOCR on the CTW dataset and other open-set benchmarks.


On a side note, I think the huge gaps between our methods and Chen et. al.'s [1] methods should be mostly due to the split of training /testing character sets. we split randomly following HDE[2], while Chen et.al.'s methods seem to split the characters according to a certain order, probably derived from fewran[3].

[1] https://github.com/FudanVI/FudanOCR [2] Cao, Zhong, Jiang Lu, Sen Cui, and Changshui Zhang. "Zero-shot handwritten chinese character recognition with hierarchical decomposition embedding." Pattern Recognition 107 (2020): 107488. [3] Tianwei Wang, Zecheng Xie, Zhe Li, Lianwen Jin, and Xiangle Chen. Radical aggregation network for few-shot offline handwritten Chinese character recognition. Pattern Recognition Letters, 125:821–827, 2019. 2, 7

bad-meets-joke commented 1 year ago

Thank you for your detailed response. My doubts have been answered.