tomerm / MLClassification

Classification using ML approach for English / Hebrew / Arabic data sets
1 stars 2 forks source link

Questions/requests from the client #43

Open matanzuckerman opened 5 years ago

matanzuckerman commented 5 years ago

Hi @semion1956 @tomerm

I have three Questions/requests from the client

  1. In the configuration code there are some differences in the logic between the parts, for example: In "model" you have the following parameters:

Need to save datasets, correspond to cross-val. cycle with the best results

cvSave = yes

Path to the folder containing train and test datasets used in cross-val. loop with the best results

cvPath = Downloads/crossValidation

The first one is boolean, and if the user decide to insert "yes" then the second is the path. this is simple and well understood logic. On the other hand under "data" the parameter "testPath" should be empty if it's not in used and there is no "boolean parameter" that will say if to use it or not. The clients requests to have 1 logic to go with it throw all the config file. (with boolean parameters)

  1. There are some descriptions of the parameters that are not comprehensible. for example: "dataToks" - Tokenization of loaded data it is not clear from the description what is the purpose of this boolean parameter, and even more confusing when having in addition "actualToks" parameter. If you can, please elaborate a little more about the "tricky" parameters.

  2. When running cross validation we save the documents of the best iteration (for both test and train). In the case I'm running couple of models and all of them are in cross validation then only the last model will be saved, is that right?

  3. The html is not supporting cross validation? the json file needed for the html is not generated when choosing cross validation. I can see that in script "consolidation" in Line 14, if there are no results then there is no json. Is it possible that also cross validation will also generate json and with it we could see the results in the HTML?

    Thanks