Api2 deduplicate - Githubissues

This pull request adds a warning message to the TCRrep.dedupicate function to tell the user if any of the seq/cell entries have Null values in the group by index columns. This behavior can lead to cases where cells are not counted as part of clones because some piece of information is not available such as a gene name.

The PR also adds the function TCRrep.show_incomplete() that allows the user to see these seq/cells that would be dropped.

Three basic tests are added.

For more on this issue see" https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#na-values-in-groupby

I am noticing that when pandas attempts to groupby and a column used to groupby contains some NA values, those entries with any NaN are dropped.

kmayerb / tcrdist2

Api2 deduplicate #28