stanfordnlp / cocoa

Framework for learning dialogue agents in a two-player game setting.
MIT License
158 stars 62 forks source link

Third party eval #21

Closed mihail911 closed 7 years ago

mihail911 commented 7 years ago

Basic server for third party eval task for dialogues. Note exact questions posted to workers on the frontend is still subject to change.

@hhexiy @percyliang @anushabala

anushabala commented 7 years ago

That's a great point! We probably need to generate the worker IDs ourselves and then provide a code to each worker (on task completion) that can then be used to link our worker IDs and the MTurk worker IDs. I'm not sure if there's a way to directly get the MTurk worker ID (apart from asking workers to just enter it themselves which could be reasonable). On Mon, Dec 26, 2016 at 2:22 AM hhexiy notifications@github.com wrote:

@hhexiy commented on this pull request.

In src/web/third_party_eval_app.py https://github.com/stanfordnlp/game-dialogue/pull/21#pullrequestreview-14368225 :

+ +parser = argparse.ArgumentParser() +parser.add_argument("--scenarios", type=str, help="path to scenarios file") +parser.add_argument("--examples", type=str, help="path to examples file") +parser.add_argument("--port", type=int, help="port to launch app on") +args = parser.parse_args() + +def init_database(db_file):

  • """
  • Initalize database
  • :param db_file: Path to db
  • :return:
  • """
  • conn = sqlite3.connect(db_file)
  • c = conn.cursor()
  • c.execute("""CREATE TABLE Responses (dialogue_num integer, humanlike_0 text, correct_0 text, strategic_0 text, fluent_0 text,

Can we also record the worker id? We'll probably have 3-5 worker to evaluate one dialogue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/game-dialogue/pull/21#pullrequestreview-14368225, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLcv_cYCTRDVWPDViooQg78CR-MpMHBks5rLteGgaJpZM4LVMvl .

-- Anusha Balakrishnan M.S. Computer Science (Artificial Intelligence) Stanford University '17 anusha@cs.stanford.edu

percyliang commented 7 years ago

For the 3rd party evaluation, do we even need to host our own server, actually? Could we just upload a CSV file with all the dialogues, and just collect the Turker responses? That way, we can let MTurk manage everything.

On Sun, Dec 25, 2016 at 7:25 PM, anushabala notifications@github.com wrote:

That's a great point! We probably need to generate the worker IDs ourselves and then provide a code to each worker (on task completion) that can then be used to link our worker IDs and the MTurk worker IDs. I'm not sure if there's a way to directly get the MTurk worker ID (apart from asking workers to just enter it themselves which could be reasonable). On Mon, Dec 26, 2016 at 2:22 AM hhexiy notifications@github.com wrote:

@hhexiy commented on this pull request.

In src/web/third_party_eval_app.py https://github.com/stanfordnlp/game-dialogue/pull/21#pullrequestreview- 14368225 :

+ +parser = argparse.ArgumentParser() +parser.add_argument("--scenarios", type=str, help="path to scenarios file") +parser.add_argument("--examples", type=str, help="path to examples file") +parser.add_argument("--port", type=int, help="port to launch app on") +args = parser.parse_args() + +def init_database(db_file):

  • """
  • Initalize database
  • :param db_file: Path to db
  • :return:
  • """
  • conn = sqlite3.connect(db_file)
  • c = conn.cursor()
  • c.execute("""CREATE TABLE Responses (dialogue_num integer, humanlike_0 text, correct_0 text, strategic_0 text, fluent_0 text,

Can we also record the worker id? We'll probably have 3-5 worker to evaluate one dialogue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/game-dialogue/pull/21#pullrequestreview- 14368225, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLcv_ cYCTRDVWPDViooQg78CR-MpMHBks5rLteGgaJpZM4LVMvl .

-- Anusha Balakrishnan M.S. Computer Science (Artificial Intelligence) Stanford University '17 anusha@cs.stanford.edu

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/game-dialogue/pull/21#issuecomment-269154390, or mute the thread https://github.com/notifications/unsubscribe-auth/AAakuJkidXbYMcfeF7SdO42IfBAFMdqMks5rLzOlgaJpZM4LVMvl .

anushabala commented 7 years ago

Hmm that's a good point. I'm not sure that we need to, though I'm not entirely clear on how custom HTML templates work with Turk. I also don't know how you could do something like a for loop to loop over all the attributes or messages for example.. if it can be done that'd be really convenient! On Mon, Dec 26, 2016 at 8:59 AM Percy Liang notifications@github.com wrote:

For the 3rd party evaluation, do we even need to host our own server, actually? Could we just upload a CSV file with all the dialogues, and just collect the Turker responses? That way, we can let MTurk manage everything.

On Sun, Dec 25, 2016 at 7:25 PM, anushabala notifications@github.com wrote:

That's a great point! We probably need to generate the worker IDs ourselves and then provide a code to each worker (on task completion) that can then be used to link our worker IDs and the MTurk worker IDs. I'm not sure if there's a way to directly get the MTurk worker ID (apart from asking workers to just enter it themselves which could be reasonable). On Mon, Dec 26, 2016 at 2:22 AM hhexiy notifications@github.com wrote:

@hhexiy commented on this pull request.

In src/web/third_party_eval_app.py < https://github.com/stanfordnlp/game-dialogue/pull/21#pullrequestreview- 14368225> :

+ +parser = argparse.ArgumentParser() +parser.add_argument("--scenarios", type=str, help="path to scenarios file") +parser.add_argument("--examples", type=str, help="path to examples file") +parser.add_argument("--port", type=int, help="port to launch app on") +args = parser.parse_args() + +def init_database(db_file):

  • """
  • Initalize database
  • :param db_file: Path to db
  • :return:
  • """
  • conn = sqlite3.connect(db_file)
  • c = conn.cursor()
  • c.execute("""CREATE TABLE Responses (dialogue_num integer, humanlike_0 text, correct_0 text, strategic_0 text, fluent_0 text,

Can we also record the worker id? We'll probably have 3-5 worker to evaluate one dialogue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/stanfordnlp/game-dialogue/pull/21#pullrequestreview- 14368225>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLcv_ cYCTRDVWPDViooQg78CR-MpMHBks5rLteGgaJpZM4LVMvl .

-- Anusha Balakrishnan M.S. Computer Science (Artificial Intelligence) Stanford University '17 anusha@cs.stanford.edu

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/stanfordnlp/game-dialogue/pull/21#issuecomment-269154390 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAakuJkidXbYMcfeF7SdO42IfBAFMdqMks5rLzOlgaJpZM4LVMvl

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/game-dialogue/pull/21#issuecomment-269154733, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLcv5JbgCCyyoBAM5v9ush95BQXoMAXks5rLzSMgaJpZM4LVMvl .

-- Anusha Balakrishnan M.S. Computer Science (Artificial Intelligence) Stanford University '17 anusha@cs.stanford.edu

percyliang commented 7 years ago

HTML templates are pretty easy, but we should check if we're able to embed arbitrary Javascript inside (in a Githubissues.

  • Githubissues is a development platform for aggregating issues.