matsengrp / vampire

🧛 Deep generative models for TCR sequences 🧛
Apache License 2.0
16 stars 4 forks source link

Ingest naive BCR data #93

Open matsen opened 5 years ago

matsen commented 5 years ago

@krdav is prepping some naive BCR data.

We'll need to be able to handle different germline genes, of course, and for that I think that we should take the path of #12 and allow specification of germline genes with a file.

Is there anything else we should be thinking about? K, how do you anticipate dealing with SHM? Even though these are naive sorts, we're sure to get some leak-through. Shall we just throw out sequences with clear mutations in the V and J encoded sections?

krdav commented 5 years ago

We can definitely toss the ones with mutations deep inside V/J but in the junction region things are a little more tricky because of annotation uncertainty. Let me return ones I get a better look at the data.