Open georgiosn opened 1 month ago
Working on this. For the time being, is autocsv alright with path https://archive.ics.uci.edu/static/public/222/bank+marketing.zip/bank/bank.csv
? (It doesn't need the rest of the arguments.)
You may also encounter #20 next, so let's prioritize that.
Update also here.
After fixing the pickle issue, auto csv works.
With input: {"path":"http://host.k3d.internal:5000/bank/bank.csv","delimiter":";","numeric":["age","duration","campaign","pdays","previous"],"c ategorical":["job","marital","education","default","housing","loan","contact","poutcome"],"label":"y"}
custom csv has the following error:
time="2024-10-22T12:02:04.957Z" level=info msg="capturing logs" argo=true
time="2024-10-22T12:02:05.063Z" level=info msg="capturing logs" argo=true
I1022 12:02:05.119151 32 launcher_v2.go:90] input ComponentSpec:{
"inputDefinitions": {
"parameters": {
"data_custom_csv__params": {
"parameterType": "STRUCT",
"defaultValue": {
"categorical": "None",
"delimiter": ",",
"label": "None",
"numeric": "None",
"path": "",
"skip_invalid_lines": true
},
"isOptional": true
}
}
},
"outputDefinitions": {
"artifacts": {
"output": {
"artifactType": {
"schemaTitle": "system.Dataset",
"schemaVersion": "0.0.1"
}
}
},
"parameters": {
"Output": {
"parameterType": "STRING"
}
}
},
"executorLabel": "exec-data-custom-csv"
}
I1022 12:02:05.121093 32 cache.go:116] Connecting to cache endpoint 10.43.199.76:8887
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[KFP Executor 2024-10-22 12:02:10,393 INFO]: --component_module_path is not specified. Looking for component `data_custom_csv` in config file `kfp_config.ini` instead
[KFP Executor 2024-10-22 12:02:10,395 INFO]: Loading KFP component "data_custom_csv" from catalogue/dataset_loaders/custom_csv.py (directory "catalogue/dataset_loaders" and module name "custom_csv")
[KFP Executor 2024-10-22 12:02:13,942 INFO]: Got executor_input:
{
"inputs": {
"parameterValues": {
"data_custom_csv__params": {
"categorical": [
"job",
"marital",
"education",
"default",
"housing",
"loan",
"contact",
"poutcome"
],
"delimiter": ";",
"label": "y",
"numeric": [
"age",
"duration",
"campaign",
"pdays",
"previous"
],
"path": "http://host.k3d.internal:5000/bank/bank.csv"
}
}
},
"outputs": {
"parameters": {
"Output": {
"outputFile": "/tmp/kfp/outputs/Output"
}
},
"artifacts": {
"output": {
"artifacts": [
{
"type": {
"schemaTitle": "system.Dataset",
"schemaVersion": "0.0.1"
},
"uri": "minio://mlpipeline/v2/artifacts/tabular4/420c40fc-05e7-45b0-a1e3-afdede73b29e/data-custom-csv/output"
}
]
}
},
"outputFile": "/tmp/kfp_outputs/output_metadata.json"
}
}
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.11/site-packages/kfp/dsl/executor_main.py", line 109, in <module>
executor_main()
File "/usr/local/lib/python3.11/site-packages/kfp/dsl/executor_main.py", line 101, in executor_main
output_file = executor.execute()
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/kfp/dsl/executor.py", line 361, in execute
result = self.func(**func_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 17, in kfp_method
File "/usr/local/src/kfp/components/catalogue/dataset_loaders/custom_csv.py", line 43, in data_custom_csv
raw_data = fb.bench.loader.read_csv(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fairbench/bench/loader.py", line 78, in read_csv
return pd.read_csv(path, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 620, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
self._engine = self._make_engine(f, self.engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
self.handles = get_handle(
^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/common.py", line 873, in get_handle
handle = open(
^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'data//host.k3d.internal:5000/bank/bank.csv'
Error downloading file: <urlopen error [Errno -2] Name or service not known>
I1022 12:02:14.465002 32 launcher_v2.go:151] publish success.
F1022 12:02:14.465164 32 main.go:49] failed to execute component: exit status 1
time="2024-10-22T12:02:15.070Z" level=info msg="sub-process exited" argo=true error="<nil>"
Error: exit status 1
time="2024-10-22T12:02:15.963Z" level=info msg="sub-process exited" argo=true error="<nil>"
Error: exit status 1
Latest data custom csv not working in toolkit.
Input: {"path":"https://archive.ics.uci.edu/static/public/222/bank+marketing.zip/bank/bank.csv","delimiter":";","numeric":["age","duration","campaign","pdays","previous"],"categorical":["job","marital","education","default","housing","loan","contact","poutcome"],"label":"y"}
Log from KFP:
With input: {"path":"http://host.k3d.internal:5000/bank/bank.csv","delimiter":";","numeric":["age","duration","campaign","pdays","previous"],"c ategorical":["job","marital","education","default","housing","loan","contact","poutcome"],"label":"y"}
got different log in KFP: check attached txt output_kfp.txt