lil-lab / nlvr

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
http://lic.nlp.cornell.edu/nlvr/
255 stars 59 forks source link

NLVR dataset uploaded & visualized at tagtog #9

Closed juanmirocks closed 5 years ago

juanmirocks commented 5 years ago

Thank you so much for creating this great dataset.

I have uploaded the NLVR dataset to tagtog for easier visualization and exploration of the data.

Here the project's link with its guidelines/README: https://www.tagtog.net/NLVR/NLVR/-settings#tab-guidelines

Here for instance a sample: https://www.tagtog.net/NLVR/NLVR/pool%2Ftrain/aWRfhf_ACQLgY5U9nULhEdhX8938-998_1.md?p=0&i=3

It looks like this:

64972786-2c40e080-d8aa-11e9-9f17-abb79826997c

--

Do you have some thoughts? Feedback? It would be interesting to entirely explore the NLVR2 dataset too.

alsuhr-c commented 5 years ago

This looks cool! I'm not familiar with Tagtog, but it looks like it could be useful for adding annotations such as linguistic phenomena (especially the visualization of it). One comment is that if people want to download the data from this site instead, evaluation on a single permutation is not comprehensive, so maybe add a note saying to download the full dataset (for training/eval) from this repo (on Github)?

juanmirocks commented 5 years ago

Thanks @alsuhr for your encouragement!

What is your suggestion for best visualizing the permutations ? As alternative to showing the 0 permutation always and only, I was thinking either a) creating for each example/document creating 6 different documents for the 6 permutations, entitled something like n-m-k; or b) showing the other permutations in the same document (see below for how it would look like)

For now, for the current version of showing the 0 permutation always and only, in the guidelines I've clarified this as per your suggestion:

The images permutations were NOT considered (see). The image shown for each example / identifier was chosen to be arbitrarily the permutation 0. I believe showing a single image along with its description keeps the spirit of the original annotation project. Nonetheless, users of this tagtog dataset can easily retrieve the other permutations as the image file pattern is obvious. The original dataset with all the images permutations is here: https://github.com/lil-lab/nlvr/tree/master/nlvr

BTW full disclosure, I'm one of the co-creators of tagtog. We are aiming to aggregate valuable (text-based) datasets for easier exploration and collaboration. I'm more than happy to give you "ownership" of the "NLVR" user (i.e. the password). I was thinking to upload the NLVR2 dataset too :)

b) Alternative

image
alsuhr-c commented 5 years ago

Depending on what people are interested in as far as visualization, I don't think it's necessary to include all permutations as separate examples (e.g., I'm thinking this could be used for linguistic analysis, and the sentences are the same across permutations, so that could end up being redundant). Thanks for the clarification!

juanmirocks commented 5 years ago

All clear! Thank you Alane for your ideas.

Closing the issue now.