Error in data pre-processing

deeptibhegde commented 4 months ago

When I run step 3 to generate sentence_corrections.csv, I get the following error:

    [INFO] CORRECT TOKENS True
    [INFO] INITIAL CORRECTIONS AND LOWER CASING ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Text to int, one --> 1 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 15 corrected perso ----> person 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, one --> 1 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 47 corrected  ----> i 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] EMPTY TOKEN [[ ]]
    [INFO] token   not found
    [INFO] Token index 61 corrected foward ----> forward 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, four --> 4 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 69 corrected sim ----> him 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 71 corrected  ----> i 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, one --> 1 
    [INFO] Text to int, three --> 3 
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, two --> 2 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, five --> 5 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 116 corrected counter- ----> counter 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 117 corrected  ----> i 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, three --> 3 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Text to int, two --> 2 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 134 corrected staight ----> straight 
    [INFO] Token index 134 corrected soemthing ----> something 
    [INFO] Token index 136 corrected ’s ----> is 
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Replacing ...
    [INFO] Token index 151 corrected ... ----> None 
    Traceback (most recent call last):
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/kit_m2t_dataset.py", line 203, in <module>
        data = dataset_class(path, filter_data=True,min_freq=3)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/kit_m2t_dataset.py", line 46, in __init__
        self.lang = vocabulary(self.sentences, correct_tokens=correct_tokens, ask_user=False)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/vocabulary.py", line 94, in __init__
        self.token_correction(ask_user)
      File "/home/dhegde/projects/motion-pipe/M2T-Segmentation/datasets/vocabulary.py", line 158, in token_correction
        desc = " ".join(tokens)
    TypeError: sequence item 4: expected str instance, NoneType found

deeptibhegde commented 4 months ago

Resolved by replacing line 158 with:

    try:
        desc = " ".join(tokens)
    except:
        desc = " "

deeptibhegde commented 4 months ago

Wrong fix above, it instead had to do with the version of pyspellchecker. Make sure the version is pyspellchecker==0.6.2.

rd20karim / M2T-Segmentation

Error in data pre-processing #1