pswietojanski / slurp

Repository for SLURP paper
Other
97 stars 20 forks source link

Overlapping sentences between train and devel or test #5

Closed JaejinCho closed 1 year ago

JaejinCho commented 1 year ago

Hello and thank you so much for curating the data and making this dataset open to the public! I've just found some stats below and wanted to share them with you.

I found the below results from my analysis of the data: devel has 0/2033 sentences (0.0(%)) already included in train. test has 0/2974 sentences (0.0(%)) already included in train. devel has 1317/2033 sentences (64.7811116576488(%)) already included in train_synthetic. test has 1889/2974 sentences (63.517148621385346(%)) already included in train_synthetic.

I think that including train_synthetic during training and evaluating the system on devel and test could be considered not completely a fair setup.

pswietojanski commented 1 year ago

Hi, thanks. Audio-wise should be OK, as devel and test have real audio only.