huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
4.03k stars 493 forks source link

Clarification ref. running AutoTrain on Colab free #399

Closed Analect closed 11 months ago

Analect commented 11 months ago

I was curious to know, following the steps in this notebook (by adding a HF and NGROK token), I'm running this on a free Colab using a T4 GPU.

https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain.ipynb

When it comes to training, it then warns that AutoTrain is a paid offering. Is this not relying on the free Colab compute rather than any HF GPU? What exactly is being charged here? Thanks.

image

abhishekkrthakur commented 11 months ago

you can ignore that warning when running in colab or locally. ill fix it in the next iteration. :) sorry for the confusion

fronori commented 11 months ago

Thanks for getting AutoTrain working on Colab! I have successfully gotten as far as opening the UI. However, I got the following error when training tabular data.

> INFO    hardware: Local
> INFO    Running jobs: []
> INFO    Dataset: yi56-5z8c-rh4m (tabular_multi_class_classification)
Train data: [<tempfile.SpooledTemporaryFile object at 0x7cabb82542e0>]
Valid data: []
Column mapping: {'id': 'id', 'label': ['target']}

Pushing dataset shards to the dataset hub: 100% 1/1 [00:00<00:00, 16131.94it/s]
Downloading metadata: 100% 716/716 [00:00<00:00, 4.11MB/s]
INFO:     240d:1a:68c:f700:acc0:7e52:d7c5:7d09:0 - "POST /create_project HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 274, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/local/lib/python3.10/dist-packages/autotrain/app.py", line 398, in handle_form
    dset.prepare()
  File "/usr/local/lib/python3.10/dist-packages/autotrain/dataset.py", line 357, in prepare
    preprocessor.prepare()
  File "/usr/local/lib/python3.10/dist-packages/autotrain/preprocessor/tabular.py", line 80, in prepare
    train_df.push_to_hub(
  File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 5498, in push_to_hub
    repo_info.splits[split] = SplitInfo(
  File "/usr/local/lib/python3.10/dist-packages/datasets/splits.py", line 541, in __setitem__
    raise ValueError(f"Split {key} already present")
ValueError: Split train already present

The data I used is sample data: https://huggingface.co/docs/autotrain/tabular saved as csv and the parameters I set are as follows:

{
  "seed": 42,
  "categorical_columns": ["category1", "catogory2"],
  "numerical_columns": ["feature1"],
  "num_trials": 10,
  "time_limit": 600,
  "categorical_imputer": "most_frequent",
  "numerical_imputer": "median",
  "numeric_scaler": "robust"
}

Adding the optional validation data did not solve the problem. I feel very close to getting it right, so please let me know the solution! I also got other errors when I left the default parameters as none. If you could provide a minimal tutorial to get it working, many people would be enjoy it. Thank you for advance.

abhishekkrthakur commented 11 months ago

you need a unique project name (a repo not in your hf account)

fronori commented 11 months ago

Thank you for your suggestion! I changed the project name. But, still the same "ValueError: Split train already present" error occurs....

abhishekkrthakur commented 11 months ago

okay. ill take a look!

abhishekkrthakur commented 11 months ago

the issue should be fixed now