Evaluation datasets - Githubissues

vineetm / ell-881-2018-deep-learning

Course Materials for ELL 881 2018: Fundamentals of Deep Learning

9 stars 7 forks source link

Evaluation datasets #30

Open anshumitts opened 5 years ago

anshumitts commented 5 years ago

Can you please point me towards the standard datasets being used in 10.1, 10.2, 10.3 and 10.4.

vineetm commented 5 years ago

@anshumitts You would need to see the corresponding datasets as listed on paper github repo. You will find that this in turn points to SentEval toolkit

anshumitts commented 5 years ago

@vineetm This uses pytorch, are we supposed to write codes for evaluation tasks as well?

vineetm commented 5 years ago

@anshumitts I don't expect you to write code for evaluation tasks. See this example on SentEval repo page. It shows how tensorflow code can be called.. https://github.com/facebookresearch/SentEval/blob/master/examples/googleuse.py

anshumitts commented 5 years ago

@vineetm http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark none of the links are working here please help

anshumitts commented 5 years ago

@vineetm TREC data is also not available on the given site.

vineetm commented 5 years ago

@anshumitts Given the short time to go before the deadline, I would advise you to work with data that is available. You can skip the data you are not able to access

anshumitts commented 5 years ago

@vineetm It's not clear whether we should use encoder RNN to create sentence embeddings or Bag of words. Can you please suggest me the same?

vineetm commented 5 years ago

@anshumitts You should use encoder RNN to create sentence embeddings. If you have time, compare the results with bag of words (Optional).

anshumitts commented 5 years ago

@vineetm This is causing an issue Tensorflow in eager mode is conflicting with pytorch causing. buss error Please suggest a solution

vineetm commented 5 years ago

@anshumitts I can suggest you an alternate solution. You can pre-compute the sentence vectors for the evaluation tasks, and store them in a numpy array or equivalent. When working with pytorch you can use this numpy array... This might work as you don't really care about Tensorflow model, you only care about the sentence vectors.