Closed Analect closed 11 months ago
you can ignore that warning when running in colab or locally. ill fix it in the next iteration. :) sorry for the confusion
Thanks for getting AutoTrain working on Colab! I have successfully gotten as far as opening the UI. However, I got the following error when training tabular data.
> INFO hardware: Local
> INFO Running jobs: []
> INFO Dataset: yi56-5z8c-rh4m (tabular_multi_class_classification)
Train data: [<tempfile.SpooledTemporaryFile object at 0x7cabb82542e0>]
Valid data: []
Column mapping: {'id': 'id', 'label': ['target']}
Pushing dataset shards to the dataset hub: 100% 1/1 [00:00<00:00, 16131.94it/s]
Downloading metadata: 100% 716/716 [00:00<00:00, 4.11MB/s]
INFO: 240d:1a:68c:f700:acc0:7e52:d7c5:7d09:0 - "POST /create_project HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1106, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
raise e
File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 274, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/usr/local/lib/python3.10/dist-packages/autotrain/app.py", line 398, in handle_form
dset.prepare()
File "/usr/local/lib/python3.10/dist-packages/autotrain/dataset.py", line 357, in prepare
preprocessor.prepare()
File "/usr/local/lib/python3.10/dist-packages/autotrain/preprocessor/tabular.py", line 80, in prepare
train_df.push_to_hub(
File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 5498, in push_to_hub
repo_info.splits[split] = SplitInfo(
File "/usr/local/lib/python3.10/dist-packages/datasets/splits.py", line 541, in __setitem__
raise ValueError(f"Split {key} already present")
ValueError: Split train already present
The data I used is sample data: https://huggingface.co/docs/autotrain/tabular saved as csv and the parameters I set are as follows:
{
"seed": 42,
"categorical_columns": ["category1", "catogory2"],
"numerical_columns": ["feature1"],
"num_trials": 10,
"time_limit": 600,
"categorical_imputer": "most_frequent",
"numerical_imputer": "median",
"numeric_scaler": "robust"
}
Adding the optional validation data did not solve the problem. I feel very close to getting it right, so please let me know the solution! I also got other errors when I left the default parameters as none. If you could provide a minimal tutorial to get it working, many people would be enjoy it. Thank you for advance.
you need a unique project name (a repo not in your hf account)
Thank you for your suggestion! I changed the project name. But, still the same "ValueError: Split train already present" error occurs....
okay. ill take a look!
the issue should be fixed now
I was curious to know, following the steps in this notebook (by adding a HF and NGROK token), I'm running this on a free Colab using a T4 GPU.
https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain.ipynb
When it comes to training, it then warns that AutoTrain is a paid offering. Is this not relying on the free Colab compute rather than any HF GPU? What exactly is being charged here? Thanks.