huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.65k stars 442 forks source link

[BUG] Tabular Data Classification #589

Closed knackerbrot closed 2 months ago

knackerbrot commented 3 months ago

Prerequisites

Backend

Local

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

image

Column mapping expanded: { "id": "id", "features": ["amount", "description", "day", "month", "year", "account"], "label": "category" }

csv looks like this amount,description,day,month,year,account,category,id -95,random description ,6,4,2023,15,48,1 -160,random description ,6,4,2023,15,72,2 -4.05,random description ,6,4,2023,15,28,3 -20,random description ,5,4,2023,3,44,4 -16,random description ,5,4,2023,3,16,5 -30,random description ,5,4,2023,15,29,6 -480,random description ,5,4,2023,15,28,7

Error Logs

INFO | 2024-04-21 18:30:41 | autotrain.app:handle_form:539 - Column mapping: {'id': 'id', 'features': ['amount', 'description', 'day', 'month', 'year', 'account'], 'label': 'category'} ERROR: Exception in ASGI application ValueError: c not in train data

Additional Information

I haven't been able to find how to properly develop column mapping and parameters for Tabular Data Classification

abhishekkrthakur commented 3 months ago

all you need to do is specify the id and target column(s). you cannot use free text columns in tabular tasks. the data format is available here: https://huggingface.co/docs/autotrain/tabular

knackerbrot commented 2 months ago

Ok thanks - I'm suprised a that autotrain is less capable and functional than it was 2 years ago

https://www.youtube.com/watch?v=OH_e0wOkpZc

abhishekkrthakur commented 2 months ago

Its the same backend. you can have text but those will be treated as categories. There is no change in how we handle columns 🙂