macournoyer / neuralconvo

Neural conversational model in Torch
776 stars 347 forks source link

Train in another language #50

Closed rmazitov closed 8 years ago

rmazitov commented 8 years ago

I want to give training to the text in another language. What files are needed and in what format?

nabihach commented 8 years ago

You can modify the script cornell_movie_dialogs.lua to accept and pre-process any data. You just need to make sure that the function CornellMovieDialogs:load() returns a table called conversations in the right format, so that dataset.lua can process it.

For example, print(conversations[5676]) should yield

      1 : 
        {
          text : "Let's just say you're being closely watched, George."
          character : "EVAN"
        }
      2 : 
        {
          text : "...yes."
          character : "MR. MILLER"
        }
      3 : 
        {
          text : "Listen close then. You screw up again and I swear I'll kill you."
          character : "EVAN"
        }

Please note that the way dataset.lua is currently written, you don't need the character information, so the following format for conversations[i] is fine too:

      1 : 
        {
          text : "Let's just say you're being closely watched, George."
        }
      2 : 
        {
          text : "...yes."
        }
      3 : 
        {
          text : "Listen close then. You screw up again and I swear I'll kill you."
        }
rmazitov commented 8 years ago

Table what form?

nabihach commented 8 years ago

I edited my answer. Please see above.

rmazitov commented 8 years ago

Thanks!