ibm-aur-nlp / PubTabNet

Other
380 stars 79 forks source link

About WYGIWYS #10

Open tomoohive opened 3 years ago

tomoohive commented 3 years ago

I think that I want to implement WYGIWYS before implementation of EDD. I saw this issue. (https://github.com/ibm-aur-nlp/PubTabNet/issues/6#issuecomment-630506737) I'd like to know more details of WYGIWYS model.

Please tell me where should I add the RNN model to the original tutorial source? Also, what role is that RNN?

I'm thinking that RNN works prediction of structure of the table. Is this understanding of mine correct?

zhxgj commented 3 years ago

In WYGIWYS, the RNN is applied to the CNN output to capture longer spatial dependences. You can apply the RNN to each column of the the CNN output.

nishchay47b commented 3 years ago

Not sure if I did this right but I did a small experiment with WYGIWYS using this implementation. I used format_html function available in exploring_PubTabNet_dataset.ipynb and created a label file for few images, I just took few hundered sample images and ran training for 2 epochs and got the results like <html> <th> UNK </th> UNK UNK </html> where UNK was the token for out of vocab word. Maybe this happend because I used vocab from this PubTabNet dataset which are basically characters but WYGIWYS expects word tokens in the vocab. If you can create a vocab of words and use this function to generate label files with proper spacing, for each image maybe this can work

zhxgj commented 3 years ago

@nishchay47b When I train WYGIWYS, I used the character level tokenization, where HTML tags are single tokens.