sustcsonglin / TN-PCFG

source code of NAACL2021 "PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols“ and ACL2021 main conference "Neural Bilexicalized PCFG Induction"
45 stars 6 forks source link

A question about data preprocessing #2

Closed speedcell4 closed 1 year ago

speedcell4 commented 1 year ago

Hi, thanks for sharing the source code.

In the released PTB dataset I found two files named ptb-train.pickle and ptb-train-lpcfg.pickle respectively. What are the differences between them?

Thanks~

sustcsonglin commented 1 year ago

Basically they are the same, ptb-train-lpcfg.pickle contains dependency heads annotations in addition for unsupervised dependency parsing.

speedcell4 commented 1 year ago

I got it, thank you~