oxpig / AbLang

AbLang: A language model for antibodies
BSD 3-Clause "New" or "Revised" License
121 stars 24 forks source link

Data pre-processing pipeline #9

Open amoskalev opened 12 months ago

amoskalev commented 12 months ago

Hi, thanks for your work!

Can advise me the on data pre-processing pipeline that you used? Which OAS columns did you use to transcribe and translate DNAs to antibodies? How did you implement the transcription? In general the question is how did you go from raw OAS data to the data used in the paper. Thanks!

Would it be possible to release data-processing pipeline?

tsjain commented 10 months ago

I have the same question. The paper mentions that 40% of OAS sequences are missing residues at the N-terminus. Were they filtered out or were the missing amino acids masked as unknown?