Closed whwang299 closed 5 years ago
Hi Wonseok,
Thank you for your interest in this work. You raise a good point regarding question formatting. However I think we should leave the handling of such complexities to the modelers and leave the dataset in its natural state, for consistency reasons if nothing else. You can imagine, for example, preprocessing the dataset as a part of a modeling pipeline to normalize latex equations. I am hesitant to directly modify the dataset as doing so would void previous results and introduce additional biases.
Thanks, Victor
On Fri, Dec 21, 2018, 7:01 AM Wonseok Hwang <notifications@github.com wrote:
Hi @vzhong https://github.com/vzhong
I think I found some unusual questions in train.jsonl (see the image below) which contain lots of \mathrm.
[image: image] https://user-images.githubusercontent.com/11023071/50315378-dfd09c00-04f5-11e9-9d12-7a1864275b14.png
In my humble opinion, as they are few (6 questions), they can be removed or can be converted to normal text form (rather than using LaTeX form)?
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/salesforce/WikiSQL/issues/30, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxPHLpGEUL1D_BD_nGE01hywk-B20F9ks5u7BbJgaJpZM4ZdDWX .
I see. Thanks for the reply. I agree. I'll closed the issue.
Thanks!
On Dec 21, 2018 1:08 PM, "Wonseok Hwang" notifications@github.com wrote:
I see. Thanks for the reply. I agree. I'll closed the issue.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/salesforce/WikiSQL/issues/30#issuecomment-449253245, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxPHP3Fs_c42S-D_kR22v061EIP1nC4ks5u7GzDgaJpZM4ZdDWX .
Hi @vzhong
I think I found some unusual questions in train.jsonl (see the image below) which contain lots of \mathrm.
In my humble opinion, as they are few (6 questions), they can be removed or can be converted to normal text form (rather than using LaTeX form)?
Thanks!