titu1994 / PyCTakesParser

Utilities to parse the output of cTAKES
MIT License
10 stars 5 forks source link

error in parsing #6

Open doctorkermit opened 2 years ago

doctorkermit commented 2 years ago

This program is super helpful.. and it is crashing on a random file. Any help is greatly appreciated

Traceback (most recent call last): File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'true_text'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3751, in _set_item_mgr loc = self._info_axis.get_loc(key) File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc raise KeyError(key) from err KeyError: 'true_text'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "<pyshell#5>", line 1, in parser.parse_dir(in_directory_path='output/', out_directory_path='outputcsv/') File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\ctakes_parser\ctakes_parser.py", line 40, in parse_dir df = parse_file(file_path) File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\ctakes_parser\ctakes_parser.py", line 127, in parse_file results['true_text'] = results.apply(_positional_search, axis=1) File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3602, in setitem self._set_item_frame_value(key, value) File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3742, in _set_item_frame_value self._set_item_mgr(key, arraylike) File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3754, in _set_item_mgr self._mgr.insert(len(self._info_axis), key, value) File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\managers.py", line 1162, in insert block = new_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1)) File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\blocks.py", line 1937, in new_block check_ndim(values, placement, ndim) File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\blocks.py", line 1979, in check_ndim raise ValueError( ValueError: Wrong number of items passed 0, placement implies 1

Jachenice commented 2 years ago

I have a similar error. I used my own data to test the parser directory function out and I got"KeyError: 'pos_start'".

And I took a look at the test_ctakes_parser.py and delete the "assert_dataframe(df)" line and the function <test_output_df()> worked. But still, the directory one did not work.

well, it is obvious that you need to change the number here[test_ctakes_parser.py] (https://github.com/titu1994/PyCTakesParser/blob/master/tests/test_ctakes_parser.py) to fit your data. def assert_dataframe(df): assert len(df) == 157 assert len(df[df['scheme'] == 'SNOMEDCT_US']) == 149 assert len(df[df['scheme'] == 'RXNORM']) == 8 assert len(df[df['refsem'] == 'UmlsConcept']) == 558

But I still try to figure out the error message I got by running the directory parser.