norahollenstein / zuco-benchmark

ZuCo Reading Task Classification Benchmark using EEG and Eye-Tracking Data
14 stars 6 forks source link

Problem of decoding sentence text from "subject_lnorm.json" #6

Open bukun46 opened 1 month ago

bukun46 commented 1 month ago

Hi there! Thanks to the incredible work on EEG datasets!

I encountered a problem when I was trying to figure out what https://github.com/norahollenstein/zuco-benchmark/blob/5ad276d2d075a30e8a47488cff082df435870ce3/src/extract_features.py#L122

means, and by checking the code below

https://github.com/norahollenstein/zuco-benchmark/blob/5ad276d2d075a30e8a47488cff082df435870ce3/src/extract_features.py#L120

it seems like it is to split text string for the regarding sentence, but for the ContentData loaded from subject_lnorm.json, it has only lists of integers, and I tried to decode it with char(), but I received only non-printable characters like "r\x1f(\x11\x15\x1a\x1c\x19\x0f\x1b \t\x19\x05!\x15\n\r\x15\x15\x0e\x14\r\x0b\x11\x0c\"...

Can you kindly help me with this problem? I am looking forward to your reply!!

bukun46 commented 1 month ago

Also, I am trying to get the raw eeg signal for the whole sentence when the subject is reading. I did

rawData = f['rawData'] raw_eeg =f[rawData[idx][0]][:] and the shape of the raw_eeg for a single record is [length, 105], which I believe 105 is the feature dimension of a single band, how can I get features from all 8 bands (t1,t2, a1,a2, g1, g2, b1, b2)? Thanks in advance for your help!!

samuki commented 1 month ago

Hi @bukun46, thanks a lot for reaching out!

  1. The contentData is only available for the subjects in the training dataset, otherwise the lnorm is used to normalize the features.
  2. For the sentence EEG means you could use the code in extract_sentence_features. If you don't want the means, you could extract the data for individual words as done in the reading-task-classification repository, e.g., here.

I hope this answers your questions.

bukun46 commented 4 weeks ago

Thanks Samuel, this helped a lot. I just want to ensure one more thing. If I wish to extract raw EEG signal on sentence level (e.g. without the fixations on words), rawData = f['rawData'] raw_eeg =f[rawData[idx][0]][:] is this the right way to do it?