matsengrp / vampire

🧛 Deep generative models for TCR sequences 🧛
Apache License 2.0
16 stars 4 forks source link

Fail when too much filtering attrition #95

Closed matsen closed 5 years ago

matsen commented 5 years ago
(py36) stoat vampire/vampire ‹85-sorts-beta› » python preprocess_adaptive.py /fh/fast/matsen_e/data/adaptive-robins-ratio/Healthy_Subject_14_CD4_Naive.tsv ~/x.csv
Original data: 220838 rows
Restricting to in-frame: 188018 rows
Requiring sane CDR3 bounding AAs: 215 rows
Requiring CDR3 to be <= 30 amino acids: 215 rows
Requiring resolved TCRB genes: 70 rows
Requiring genes that are also present in the OLGA set: 66 rows

(!)