What is the format of Train data? Is there an example?

Pragmatism0220 commented 1 year ago

I used wahaha_tpu to train, but I got an error from this block:

class TrainDataset :
    """
     Read train dataset from text file
    """
    def __init__(self, filename) :
        self.filename = filename
        with open(filename, 'r') as file :
            print("[TrainDataSet]--- start setup ---", filename)
            self.content = file.readlines()
            print("[TrainDataSet]--- loading is completed ---", filename)
    def __len__(self) :
        return len(self.content)
    def __getitem__(self, idx) :
        line = self.content[idx]
        inputs, label = line.split(',')  # ValueError: too many values to unpack (expected 2)
        inputs = inputs.split(' ')
        inputs = [int(n) for n in inputs]
        label = int(label)
        return inputs, label

The text data is generated by this:

filename = "test_data/2020010100gm-00b9-0000-7a6863a8.mjson"
output_dir = "result"
mj2vec = Mj2Vec()
records = load_mjai_records(filename)
mj2vec.process_records(records)
mj2vec.save_to_text(output_dir)

This is obviously incorrect. So I want to ask, what is the correct format of training data? Can you provide a complete data example? Thanks a lot!

yuufyu commented 1 year ago

I'm sorry for the delay in my response.

・You need to generate the training data using the following flow: mjlog (tenhou paifu log) → mjson (mjai type 1) → feature text file (input for TrainDataset) 1: Generating mjson requires mjai (https://github.com/yuugata/mjai/tree/3p), which has been modified for three-player mahjong.

・I'll share the script that I used when I tried it. Please use it as a reference. preparation flow :https://colab.research.google.com/drive/12USV4XeQggUnxc-UD2mDqavvM9c-RHXj?usp=sharing script :https://drive.google.com/drive/folders/1LZz1JbbKp9pR9Fqts4T4k6QDNTNMZT1u?usp=sharing

Pragmatism0220 commented 1 year ago

Thank you for your reply. I understand now!

yuufyu / wahaha

What is the format of Train data? Is there an example? #2