Open NazaninSal opened 5 years ago
put your data set in text file of 1st line question and then 2nd line answer, 3rd line question 4th line answer and so on. And then in lightweightdata.py file at 59th position there will be this line "with open(filename, 'r') as f" here instead of filename put your data set location. And before running main.py in chatbot repository chatbot_tutorial-master>save>model in model folder delete everything .......then after that in command promt for running main.py........type "python main.py --corpus lightweight --datasetTag xxx" (instead of xxx put your data set filename only without file extension .txt )
Thank you very much for your reply. I will try and inform you the result.
Would you please help me on KeyError: 'CHATBOT_SECRET_KEY'? I don't know how to find it on windows10!
are you trying to run it on web interface?
Thank you very much for your reply. I will try and inform you the result.
did you get result?
Hi, Thanks again for your reply. Actually l finished training, now l want to interact with chatbot, l thought the only way is using web interface! Would you please let me know if l can interact with chatbot through command prompt or something other than web interface?
Run from command prompt.....type " python main.py --test interactive" with this chatbot will go in interactive mode.
Thank you, it run. I receive same answers for different questions. I guess maybe that's because of small dataset or short training (small number of epochs). Would you please let me know your opinion? Thanks
how many questions and answers are there in your dataset?
Total length of my dataset is 21,648 tweets ( including both questions and answers).
it is more that enough, because my data set is just of 800 lines (400 questions and 400 answers) and it is working fine. Can you show me sample of your data set?
Sure, My dataset is Hate Speech tweets with their replies. In attachment, I attached two files, first file is chat_origin.txt which are initial tweets and replies I extracted from twitter and the other is chat.txt which are my cleaned data and I fed to your algorithm. Thank you
From: shubhamsonawane21 notifications@github.com Sent: Friday, June 28, 2019 9:01:56 PM To: llSourcell/chatbot_tutorial Cc: Salehabadi, Nazanin; Comment Subject: Re: [llSourcell/chatbot_tutorial] Train my own dataset (#6)
it is more that enough, because my data set is just of 800 lines (400 questions and 400 answers) and it is working fine. Can you show me sample of your data set?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FllSourcell%2Fchatbot_tutorial%2Fissues%2F6%3Femail_source%3Dnotifications%26email_token%3DAIULIBQ7ODX634WS6WBLG63P43GCJA5CNFSM4HXKTRTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3QC4A%23issuecomment-506921328&data=02%7C01%7Cnazanin.salehabadi%40mavs.uta.edu%7C79f7c2d979194978dcfa08d6fc3e25b2%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C636973741179157974&sdata=CcxMg82Ccc2VTuFQycEmE7F3yB4Oj6zACNw6FNK4vAA%3D&reserved=0, or mute the threadhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIULIBRLHUURAHMUCQ5WUKLP43GCJANCNFSM4HXKTRTA&data=02%7C01%7Cnazanin.salehabadi%40mavs.uta.edu%7C79f7c2d979194978dcfa08d6fc3e25b2%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C636973741179167972&sdata=rYVeuDQd8Zd%2BdtdisEazu6211v1AaclXcHRrC5UIKlc%3D&reserved=0.
1141882426077122560@Juansosa710 You must be a magician...because Abracadamn yah look good 😍 114189230351205
I attached and sent the dataset through email. I don't know if you can find it correctly here or not. If you don't find here may I have your email to send the dataset?
see this is sample of my data set, in your data set there are some numbers and @some names before every line, I think you should remove that before training.
Actually l cleaned the data before feeding to algorithm, l removed all numbers and usernames from tweets. I think sentences in my dataset are longer and that makes training complicated. I guess maybe l can resolve it by more epochs. Would you please let me know your opinion? I will try with more epochs and let you know the result. Thanks
Sent from my iPhone
On Jun 28, 2019, at 22:31, shubhamsonawane21 notifications@github.com<mailto:notifications@github.com> wrote:
see this is sample of my data set, in your data set there are some numbers and @somehttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsome&data=02%7C01%7Cnazanin.salehabadi%40mavs.uta.edu%7Cef4ea08fe0164098c1aa08d6fc424605%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C636973758904502287&sdata=UJq%2FYSk%2FZIjYHiRHU8KRPMrrDlYHP3%2Fu5U04HrG3WnI%3D&reserved=0 names before every line, I think you should remove that before training.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FllSourcell%2Fchatbot_tutorial%2Fissues%2F6%3Femail_source%3Dnotifications%26email_token%3DAIULIBUH3C575YEDCBPPZB3P43JRBA5CNFSM4HXKTRTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3QNVA%23issuecomment-506922708&data=02%7C01%7Cnazanin.salehabadi%40mavs.uta.edu%7Cef4ea08fe0164098c1aa08d6fc424605%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C636973758904502287&sdata=%2Bl69oGTEYs%2BYlQLi3324tLAD2GQb%2BwV3z9YYAmyoEkU%3D&reserved=0, or mute the threadhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIULIBTAU7IZJTVCLCBBSBLP43JRBANCNFSM4HXKTRTA&data=02%7C01%7Cnazanin.salehabadi%40mavs.uta.edu%7Cef4ea08fe0164098c1aa08d6fc424605%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C636973758904512280&sdata=n4THyuHh1EwhaVYWUyL82RDWuD5x0nD97koCze4vvsg%3D&reserved=0.
i think sentence limit is 10 here. And actually I got good output with 30 epoch itself, and I am also new to this so I can say try only :P
Thank you very much for your replies and feedbacks. It was really helpful. I will try with more epochs and let you know the result :)
Sent from my iPhone
On Jun 28, 2019, at 23:44, shubhamsonawane21 notifications@github.com<mailto:notifications@github.com> wrote:
i think sentence limit is 10 here. And actually I got good output with 30 epoch itself, and I am also new to this so I can say try only :P
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FllSourcell%2Fchatbot_tutorial%2Fissues%2F6%3Femail_source%3Dnotifications%26email_token%3DAIULIBVNSIF6KSAMRSVMLJDP43SBZA5CNFSM4HXKTRTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3RISI%23issuecomment-506926153&data=02%7C01%7Cnazanin.salehabadi%40mavs.uta.edu%7Cf9a053694efe4f830e6c08d6fc4c6f0f%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C636973802544697315&sdata=RMvz5U4zyRXYWZd%2BKtUbKrbrfvrcJ1%2BxKWEGxsNWiww%3D&reserved=0, or mute the threadhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIULIBSOIIJEUMZTOD5V6DLP43SBZANCNFSM4HXKTRTA&data=02%7C01%7Cnazanin.salehabadi%40mavs.uta.edu%7Cf9a053694efe4f830e6c08d6fc4c6f0f%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C636973802544697315&sdata=lAKW%2FlD9h5ZXOXyPabd2BoyAJDoIx8sfWBWe6CZjvLY%3D&reserved=0.
okay... welcome 👍 :)
Hi, I increased epochs to 140 and then loss improved to 0.12 and perplexity 1.13. Now I get better result. Thanks again for help.
Nice ♥ :)
how did you calculate perplexity?
how can i increase the number of words allowed in a question/reply?
it is more that enough, because my data set is just of 800 lines (400 questions and 400 answers) and it is working fine. Can you show me sample of your data set?
it did well with 30 epochs?
it is more that enough, because my data set is just of 800 lines (400 questions and 400 answers) and it is working fine. Can you show me sample of your data set?
it did well with 30 epochs?
for me it did well with 150 epochs.
how can i increase the number of words allowed in a question/reply?
in chatbot.py file, as per screenshot maxLength's default value is 10, change it to value you want.
Hi, If I want to train the model based on only my own dataset. What should I do? I found in "data>lightweight" you mentioned required format for model. Should I only obey this format and put my dataset in data directory or there are some other things I have to do with train files? I think I have to change "availableCorpus", in this part also : chatbot>textdata.py (line52)
Would you please help me if need to change any other parts?
Thanks Naz