Hi Dr. Bo,
This is Chai, firstly thanks for the GIANA tool , its actually a fascinating idea to use the body’s immune response as a diagnostic tool.
I am writing to request help in querying my set of sequences against a reference . I have attached a section of the input file, this was successfully clustered by the clustering command. Next, I tried to query against the reference provided with the tool, as below. Before this, I clustered hc10s10.txt, and put that rotation file in the same dir, as mentioned on the github page.
Processing tmp_query.txt
Total time elapsed: 0.290075
Maximum memory usage: 0.196432 MB
Build query clustering file. Elapsed 18.401398
Now mering with reference cluster
Traceback (most recent call last):
File "GIANA4.py", line 1207, in
main()
File "GIANA4.py", line 1151, in main
MergeExist(refClusterFile, OutDir+'/'+outFile)
File "/gpfs/scratch/cs5359/Projects/Weberlab_GIANA/GIANA/query.py", line 173, in MergeExist
queryT=pd.read_table(queryClusterFile, skiprows=2, delimiter='\t', header=None)
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1242, in read_table
return _read(filepath_or_buffer, kwds)
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 583, in _read
return parser.read(nrows)
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 850, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._check_tokenize_status
File "pandas/_libs/parsers.pyx", line 2029, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 4879, saw 7
I checked both the files, there is nothing different on line 4879. I noticed that the input file input_giana.xlsx on github: TestReal-ADIRP0000023_TCRB.tsv, has 3 additional cols along with the cdr3 and gene info. These 3 cols are frequencyCount, RANK, and info. Are these mandoatory and how do I create these cols for my data?
Hi Dr. Bo, This is Chai, firstly thanks for the GIANA tool , its actually a fascinating idea to use the body’s immune response as a diagnostic tool.
I am writing to request help in querying my set of sequences against a reference . I have attached a section of the input file, this was successfully clustered by the clustering command. Next, I tried to query against the reference provided with the tool, as below. Before this, I clustered hc10s10.txt, and put that rotation file in the same dir, as mentioned on the github page.
python GIANA4.py -q input_giana.tsv -r hc10s10.txt -S 3.3 -o tmp/
Here is the error I got:
Processing tmp_query.txt Total time elapsed: 0.290075 Maximum memory usage: 0.196432 MB Build query clustering file. Elapsed 18.401398 Now mering with reference cluster Traceback (most recent call last): File "GIANA4.py", line 1207, in
main()
File "GIANA4.py", line 1151, in main
MergeExist(refClusterFile, OutDir+'/'+outFile)
File "/gpfs/scratch/cs5359/Projects/Weberlab_GIANA/GIANA/query.py", line 173, in MergeExist
queryT=pd.read_table(queryClusterFile, skiprows=2, delimiter='\t', header=None)
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1242, in read_table
return _read(filepath_or_buffer, kwds)
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 583, in _read
return parser.read(nrows)
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/gpfs/home/cs5359/.local/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 850, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._check_tokenize_status
File "pandas/_libs/parsers.pyx", line 2029, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 4879, saw 7
I checked both the files, there is nothing different on line 4879. I noticed that the input file input_giana.xlsx on github: TestReal-ADIRP0000023_TCRB.tsv, has 3 additional cols along with the cdr3 and gene info. These 3 cols are frequencyCount, RANK, and info. Are these mandoatory and how do I create these cols for my data?
Thanks in advance Chai Sree