Open guillemsanchezsanchez1996 opened 1 year ago
Hi Guillem,
Thanks for using ClusTCR. We are very sorry about this inconvenience. I believe your assumption is correct, the parse_airr
function provides an numpy.array
as output, instead of a pandas.Series
. I will try to resolve the issue as soon as possible.
Thanks a lot Sebastiaan for your fast answer!
Looking forward for your help to solve this issue :)
Guillem
Ups sorry I have closed the issue by mistake!
Hi Guillem,
Thanks for using ClusTCR. We are very sorry about this inconvenience. I believe your assumption is correct, the
parse_airr
function provides annumpy.array
as output, instead of apandas.Series
. I will try to resolve the issue as soon as possible.
Hello Sebastiaan, by any chance did you have some time to solve this issue?
Thanks again for your help,
Guillem
Hello Sebastian and co.
Thanks a lot for designing this nice package to understand the nature of TCR repertoire and potential expansions. My goal with the current airr data I have is to compare differences in clusters between different subjects and I think your batch approach can be really useful for this objective.
I have been following some of the sections in your docs document but unfortunately I am stuck with the demo for clustering a set of repertoires simultaneously. The main issue is with the metarepertoire function. Here is the error:
In [14]: training_sample_size = round(1000 * (total_cdr3s / 5000)) ...: training_sample = metarepertoire(directory=datadir, ...: data_format='airr', ...: n_sequences=training_sample_size) ...:
TypeError Traceback (most recent call last) Cell In[14], line 2 1 training_sample_size = round(1000 * (total_cdr3s / 5000)) ----> 2 training_sample = metarepertoire(directory=datadir, 3 data_format='airr', 4 n_sequences=training_sample_size)
File ~/miniconda3/envs/clustcr/lib/python3.9/site-packages/clustcr/input/datasets.py:65, in metarepertoire(directory, data_format, out_format, n_sequences) 63 meta = pd.concat([meta, parse_immuneaccess(file, out_format=out_format)]) 64 elif data_format.lower()=='airr': ---> 65 meta = pd.concat([meta, parse_airr(file)]) 66 elif data_format.lower()=='tcrex': 67 meta = pd.concat([meta, parse_tcrex(file)])
File ~/miniconda3/envs/clustcr/lib/python3.9/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, *kwargs)
325 if len(args) > num_allow_args:
326 warnings.warn(
327 msg.format(arguments=_format_argument_list(allow_args)),
328 FutureWarning,
329 stacklevel=find_stack_level(),
330 )
--> 331 return func(args, **kwargs)
File ~/miniconda3/envs/clustcr/lib/python3.9/site-packages/pandas/core/reshape/concat.py:368, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy) 146 @deprecate_nonkeyword_arguments(version=None, allowed_args=["objs"]) 147 def concat( 148 objs: Iterable[NDFrame] | Mapping[HashableT, NDFrame], (...) 157 copy: bool = True, 158 ) -> DataFrame | Series: 159 """ 160 Concatenate pandas objects along a particular axis. 161 (...) 366 1 3 4 367 """ --> 368 op = _Concatenator( 369 objs, 370 axis=axis, 371 ignore_index=ignore_index, 372 join=join, 373 keys=keys, 374 levels=levels, 375 names=names, 376 verify_integrity=verify_integrity, 377 copy=copy, 378 sort=sort, 379 ) 381 return op.get_result()
File ~/miniconda3/envs/clustcr/lib/python3.9/site-packages/pandas/core/reshape/concat.py:458, in _Concatenator.init(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort) 453 if not isinstance(obj, (ABCSeries, ABCDataFrame)): 454 msg = ( 455 f"cannot concatenate object of type '{type(obj)}'; " 456 "only Series and DataFrame objs are valid" 457 ) --> 458 raise TypeError(msg) 460 ndims.add(obj.ndim) 462 # get the sample 463 # want the highest ndim that we have, and must be non-empty 464 # unless all objs are empty
TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid**
I think the main issue is that airr files are not loaded as pd dataframe. See this code as an example:
**data = read_cdr3('/mnt/c/Users/usuari/Desktop/mixcr-4.1.2/clustcr/output_TRB_SP_135.tsv', data_form ...: at='airr')
In [25]: data Out[25]: array(['CASSQGFGTQYF', 'CASSQSQYAEQFF', 'CASSRGAADTLYF', ..., 'SASSLGQNNSPLHF', 'SASSSYEQHF', 'RGHTGQLYF'], dtype=object)**
Do you have an idea about which can be the problem?
All my best,
Guillem Sanchez