Closed matsen closed 5 years ago
More importantly, we should account in our comparisons for the fact that OLGA can't emit certain sequences do to name conversion things.
So, probably we should just toss any sequences from any program or data that aren't in the intersection of both sets.
We are restricting gene usage in the preprocessing script, and generating sequences using Ppost rejection sampling, so this is no longer a problem.
When we generate sequences using OLGA, we ask for nseqs sequences. After gene name conversion we get about 10% less than that.