yangliuy / TopicExpertiseModel

Java implementation of Gibbs sampling for Topic Expertise Model published in CIKM'13
11 stars 10 forks source link

scriptData missing in Dropbox upload #4

Open jonaschn opened 3 years ago

jonaschn commented 3 years ago

I could not find the data which is expected to be in the scriptDataPath. The Dropbox upload contains (to my knowledge) only the training ( originalDataPath) and test data (testDataPath). https://github.com/yangliuy/TopicExpertiseModel/blob/0f918d78a420b9cf03834576d6595060e64f8a56/src/tem/conf/PathConfig.java#L5

ifonema commented 3 years ago

scriptDataPath is used only within some of the classes in tem.script package and in SimpleEvaluate class of tem.main package. In particular scriptDataPath is utilized to read the userID file. To me, the export step and the training set creation step are both clear, but I can't figure out how to create the test set. Any ideas?

jonaschn commented 3 years ago

@ifonema Did you noticed my fork https://github.com/jonaschn/TopicExpertiseModel? I refactored a little bit and improved the code documentation. Therein, I also mention how the test set is created:

https://github.com/yangliuy/TopicExpertiseModel/blob/0f918d78a420b9cf03834576d6595060e64f8a56/src/tem/main/SimpleEvaluate.java#L43-L45

ifonema commented 3 years ago

@jonaschn Thank you for your fork, I am using it now.

https://github.com/yangliuy/TopicExpertiseModel/blob/0f918d78a420b9cf03834576d6595060e64f8a56/src/tem/main/SimpleEvaluate.java#L43

Method readQATestDocs in line 43 reads questions from testData.question file as it can be seen from the following code snippet:

https://github.com/yangliuy/TopicExpertiseModel/blob/0f918d78a420b9cf03834576d6595060e64f8a56/src/tem/main/Documents.java#L77 https://github.com/yangliuy/TopicExpertiseModel/blob/0f918d78a420b9cf03834576d6595060e64f8a56/src/tem/main/Documents.java#L91

testData.question file is created in the main method of ExportTestDataForRank class https://github.com/yangliuy/TopicExpertiseModel/blob/0f918d78a420b9cf03834576d6595060e64f8a56/src/tem/script/ExportTestDataForRank.java#L35-L36 where testDataQuestions.id file is used in order to export data from the database, but I don't know where the testDataQuestions.id file is created. A naive solution consists of creating a tab separated text file in which the second column contains the ids of questions belonging to the test set, but I'd like to know if there is in the repository the code for the automatic generation of the test set.

jonaschn commented 3 years ago

To be honest: I don't know. I didn't try to generate any test data for myself.