Closed celsofranssa closed 1 year ago
It's the file of the BOW feature provided by the dataset.
I must generate this BOW feature for different training/testing splits since I am applying k-fold cross-validation. Therefore, please give me directions on how to generate it.
This file and the raw text file data/Amazon-670K/train_texts.txt
are corresponding so you can just use the same partition on these files.
This file and the raw text file
data/Amazon-670K/train_texts.txt
are corresponding so you can just use the same partition on these files.
And how could I do the same to the other folds?
I was able to generate this feature file by combining TfidfVectorizer and dumping it in svmlight format. I hope that's correct.
The dataset Amazon-670k config has an additional parameter:
sparse: data/Amazon-670k/train_v1.txt
, which is not generated from therun_preprocess.sh
script.What is
train_v1.txt
, and how to generate it?