Open doctorkermit opened 2 years ago
I have a similar error. I used my own data to test the parser directory function out and I got"KeyError: 'pos_start'".
And I took a look at the test_ctakes_parser.py and delete the "assert_dataframe(df)" line and the function <test_output_df()> worked. But still, the directory one did not work.
well, it is obvious that you need to change the number here[test_ctakes_parser.py] (https://github.com/titu1994/PyCTakesParser/blob/master/tests/test_ctakes_parser.py) to fit your data.
def assert_dataframe(df):
assert len(df) == 157
assert len(df[df['scheme'] == 'SNOMEDCT_US']) == 149
assert len(df[df['scheme'] == 'RXNORM']) == 8
assert len(df[df['refsem'] == 'UmlsConcept']) == 558
But I still try to figure out the error message I got by running the directory parser.
This program is super helpful.. and it is crashing on a random file. Any help is greatly appreciated
Traceback (most recent call last): File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'true_text'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3751, in _set_item_mgr loc = self._info_axis.get_loc(key) File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc raise KeyError(key) from err KeyError: 'true_text'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "<pyshell#5>", line 1, in
parser.parse_dir(in_directory_path='output/', out_directory_path='outputcsv/')
File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\ctakes_parser\ctakes_parser.py", line 40, in parse_dir
df = parse_file(file_path)
File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\ctakes_parser\ctakes_parser.py", line 127, in parse_file
results['true_text'] = results.apply(_positional_search, axis=1)
File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3602, in setitem
self._set_item_frame_value(key, value)
File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3742, in _set_item_frame_value
self._set_item_mgr(key, arraylike)
File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3754, in _set_item_mgr
self._mgr.insert(len(self._info_axis), key, value)
File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\managers.py", line 1162, in insert
block = new_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\blocks.py", line 1937, in new_block
check_ndim(values, placement, ndim)
File "C:\Users\dbknox1\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\blocks.py", line 1979, in check_ndim
raise ValueError(
ValueError: Wrong number of items passed 0, placement implies 1