svalkiers / clusTCR

CDR3 clustering module providing a new method for fast and accurate clustering of large data sets of CDR3 amino acid sequences, and offering functionalities for downstream analysis of clustering results.
Other
48 stars 9 forks source link

How to import file output from MiXCR? #21

Closed Ming-Lian closed 3 years ago

Ming-Lian commented 3 years ago

It's said that the file format from MiXCR following the AIRR standard. But when I use the following code to import my dataset in mixcr format:

data = read_cdr3('mixcr_out/2018-R-KF-DTCR461.clonotypes.TRB.txt', data_format='airr')

a error was reported like this:

sys:1: DtypeWarning: Columns (8,12) have mixed types.Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
  File "/mnt/data/lianm/software/Miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'productive'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/data/lianm/software/clusTCR/clustcr/input/datasets.py", line 39, in read_cdr3
    return parse_airr(file)
  File "/mnt/data/lianm/software/clusTCR/clustcr/input/airr.py", line 5, in parse_airr
    data = data[data["productive"]==True]
  File "/mnt/data/lianm/software/Miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/mnt/data/lianm/software/Miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'productive'

Could you please give me some advise, thanks a lot!

svalkiers commented 3 years ago

Hi,

Thank you for reporting this issue. I have pushed some code that should solve the problem. To use the updated version of the package, uninstall your current clusTCR installation and then reinstall the package:

conda uninstall clustcr

followed by

conda install clustcr -c svalkiers -c bioconda -c pytorch -c conda-forge

Let me know if it works or not.

Best regards. Sebastiaan