tsumers / bert-brains

GNU General Public License v3.0
24 stars 3 forks source link

Is it possible to provide ’./data/stimuli/{}/align.json‘ in narratives-transcript-processing.py?? #3

Open Zhangzhixiang-laugh opened 1 month ago

Zhangzhixiang-laugh commented 1 month ago

I didn't find the ’./data/stimuli/{}/align.json‘ when I was running tnarratives-transcript-processing.py with 'PHONEMES = True' to prepare the input features (such as "black"). By going through the code, align.json and align.csv are stimulus information with different structures and properties. without the align.json, the following codes can not be performed:

if PHONEMES:

    original_json = json.load(open("../data/stimuli/{}/align.json".format(STORY)))
    original_transcript = pd.DataFrame.from_records(original_json['words'])
    original_transcript.rename(axis='columns',
                               mapper={'start': 'start_ts', 'end': 'end_ts', 'word': 'cased', 'alignedWord': 'uncased'},
                               inplace=True)

Especially these codes:

if PHONEMES:
    original_transcript['phones'] = original_transcript['phones'].apply(lambda d: d if isinstance(d, list) else [])

I wonder that is it possible to provide ’./data/stimuli/{}/align.json‘ for narratives-transcript-processing.py?? Thank you so much!!!

Zhangzhixiang-laugh commented 1 month ago

The 'original_transcript['phones']' could be created using the 'align.csv' file, the reference code is as follows:

import pronouncing

original_transcript = pd.read_csv("../data/stimuli/{}/align.csv".format(STORY), header=None,
                                  names=["cased", "uncased", "start_ts", "end_ts"])
PHONEME_LIST_FROZEN = ["ao", "iy", "m", "dh", "ow", "k", "w", "ey", "s", "ch", "sh", "aw", "ay", "l", "jh", "v",
                       "g","r", "oy", "er", "ae", "d", "hh", "th", "ih", "uw", "aa", "z", "zh", "oov", "ng", "p",
                       "f","ah","n", "b", "uh", "y", "t", "eh"]

def get_phonemes(word):
    if not isinstance(word, str) or word.strip() == '':
        return np.nan
    phones = pronouncing.phones_for_word(word)
    if not phones:
        return [{'phone': 'unknown'}]
    # Use the first pronunciation as the standard
    phonemes = phones[0].split()
    # Convert to lowercase and filter
    filtered_phonemes = [phoneme.lower().rstrip('0123456789') for phoneme in phonemes if
                         phoneme.lower().rstrip('0123456789') in PHONEME_LIST_FROZEN]
    # Join phonemes with "_" and put into a dictionary
    return [{'phone': '_'.join(filtered_phonemes)}] if filtered_phonemes else [{'phone': 'unknown'}]

if PHONEMES:
    # Use the get_phonemes function to convert each word
    original_transcript['phones'] = original_transcript['uncased'].apply(get_phonemes)