tianshilu / pMTnet

Deep Learning the T Cell Receptor Binding Specificity of Neoantigen
GNU General Public License v2.0
76 stars 20 forks source link

Strange characters in the testing_data.csv and training_data.csv #7

Closed ddd9898 closed 2 years ago

ddd9898 commented 2 years ago

hi, I find some strange characters in the datasets you provided for training and testing. For example, in the 30 and 31 row of testing_data.csv, the antigen sequence seems to contain a strange Chinese word.

a

When I loaded this file with pandas, I found this character seems to be '\xa0'. image So, is this a mistake made in generating the files or '\xa0' could have some special meaning? Thank you.

wtwt5237 commented 2 years ago

Hi @Miles-DDD

I opened up the test data csv file online, and these are what I see:

CAWSETGLGMGGWQFG | ELAGIGILTV | A02:01 CAWSETGLGTGELFFG | ELAGIGILTV | A02:01

Is it possible that this strange character is caused by the downloading process (to your computer)?

Tao