Open lopierra opened 2 weeks ago
It seems the map
function is executing on my local machine, but typically, map
cannot be directly applied to a DataFrame. To ensure consistency and correctness, we should update it to use applymap
or apply
which are the appropriate methods for applying functions to DataFrame elements. I will be preparing a PR to make this adjustment.
Hi Pierrette,
I wanted to bring to your attention that the applymap
function has been deprecated for pandas versions after 2.1.0. You can find more details in the pandas documentation here. It was working for me because my pandas version is 2.2.0.
We could switch to using applymap
as suggested in earlier versions of pandas. However, please note that with future pandas updates, it might not work.
Could you please try updating your pandas version? This should resolve the issue.
Thank you!
@madanucd I updated pandas and got a bit further with the validator. I ran it on the same file that I sent you before (ABC-DS.csv) and got the expected validation errors, but also got a TypeError. Is this expected? (Maybe due to ABC-DS having IDs that are integers instead of strings?)
validate-data -o ./errorlogs ./ABC-DS.csv participant
Validating participant data from file: ./ABC-DS.csv
Traceback (most recent call last):
File "C:\Users\lopi\OneDrive - The University of Colorado Denver\Documents\R_linkml\src\data_validation\validate_participant.py", line 7, in validate_participant_entry
instance = Participant(
^^^^^^^^^^^^
File "C:\Users\lopi\AppData\Local\pypoetry\Cache\virtualenvs\src-8-5-hlTp-py3.12\Lib\site-packages\pydantic\main.py", line 192, in __init__
self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 7 validation errors for Participant
participantExternalId
Input should be a valid string [type=string_type, input_value=10001, input_type=int]
For further information visit https://errors.pydantic.dev/2.8/v/string_type
familyId
Input should be a valid string [type=string_type, input_value=nan, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/string_type
fatherId
Input should be a valid string [type=string_type, input_value=nan, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/string_type
motherId
Input should be a valid string [type=string_type, input_value=nan, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/string_type
siblingId
Input should be a valid string [type=string_type, input_value=nan, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/string_type
otherFamilyMemberId
Input should be a valid string [type=string_type, input_value=nan, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/string_type
ageAtLastVitalStatus
Input should be a finite number [type=finite_number, input_value=nan, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/finite_number
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\lopi\AppData\Local\pypoetry\Cache\virtualenvs\src-8-5-hlTp-py3.12\Scripts\validate-data", line 6, in <module>
sys.exit(main())
^^^^^^
File "C:\Users\lopi\OneDrive - The University of Colorado Denver\Documents\R_linkml\src\data_validation\cli.py", line 36, in main
validation_function(args.input_file, args.output)
File "C:\Users\lopi\OneDrive - The University of Colorado Denver\Documents\R_linkml\src\data_validation\validation.py", line 20, in validate_participant
return validate_data(file_path, string_columns, validate_participant_entry, output_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lopi\OneDrive - The University of Colorado Denver\Documents\R_linkml\src\data_validation\validation_utils.py", line 55, in validate_data
valid_count, invalid_count = validate_dataframe(df, validation_function, input_file_name=file_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lopi\OneDrive - The University of Colorado Denver\Documents\R_linkml\src\data_validation\validation_utils.py", line 20, in validate_dataframe
validation_results = df.apply(entry_validator, axis=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lopi\AppData\Local\pypoetry\Cache\virtualenvs\src-8-5-hlTp-py3.12\Lib\site-packages\pandas\core\frame.py", line 10374, in apply
return op.apply().__finalize__(self, method="apply")
^^^^^^^^^^
File "C:\Users\lopi\AppData\Local\pypoetry\Cache\virtualenvs\src-8-5-hlTp-py3.12\Lib\site-packages\pandas\core\apply.py", line 916, in apply
return self.apply_standard()
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lopi\AppData\Local\pypoetry\Cache\virtualenvs\src-8-5-hlTp-py3.12\Lib\site-packages\pandas\core\apply.py", line 1063, in apply_standard
results, res_index = self.apply_series_generator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lopi\AppData\Local\pypoetry\Cache\virtualenvs\src-8-5-hlTp-py3.12\Lib\site-packages\pandas\core\apply.py", line 1081, in apply_series_generator
results[i] = self.func(v, *self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lopi\OneDrive - The University of Colorado Denver\Documents\R_linkml\src\data_validation\validate_participant.py", line 31, in validate_participant_entry
error_details = (row['Study Code'] + "-" + row['Participant External ID'], e)
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: can only concatenate str (not "int") to str
Hi @madanucd - I'm attempting to run the validator on a test dataset:
but I get the following error message:
Am I doing something wrong, or is it an issue with the validator?
Not urgent - we can discuss next Tuesday at Data Modeling meeting. Thanks!