JSONDecodeError when uploading a valid CSV

frankiejarrett commented 3 years ago

When attempting to upload a CSV training set for my model I receive a JSONDecodeError error. I tried uploading my smaller validation set too, but it also failed. I'm not entirely sure why JSON decoders are even being ran against a CSV file.

At first I thought maybe the CSV was invalid, but it checks out. I am not sure how to debug this problem.

Any help is greatly appreciated! Thank you.

Valid CSV

$ csvclean ~/training_set.csv
No errors.

Example CSV data

col_one,col_two
TRUE,"Lorem ipsum dolor sit amet, consectetur adipiscing elit"
FALSE,"Ut id ex luctus ""with quoted text inside"" vitae tincidunt nibh"
TRUE,"Nam ligula nibh, dapibus eget justo vitae"
FALSE,"Cras sed molestie enim. Etiam facilisis erat id bibendum"

Upload attempt

$ autonlp upload --project my_project \
    --split train \
    --col_mapping col_one:target,col_two:text \
    --files ~/training_set.csv

> INFO    Uploading files for project: my_project
> INFO    🗝 Retrieving credentials from config...
> INFO    ☁ Retrieving project 'my_project' from AutoNLP...
> INFO    🔄 Refreshing project status...
> INFO    🔄 Refreshing uploaded files information...
> INFO    🔄 Refreshing models information...
> INFO    🔄 Refreshing cost information...
> INFO    ✅ Successfully loaded project: 'my_project'!
> INFO    Mapping: {'col_one': 'target', 'col_two': 'text'}
Traceback (most recent call last):
  File "/usr/local/bin/autonlp", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/autonlp/cli/autonlp.py", line 57, in main
    details = err.response.json().get("detail")
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 900, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Environment

$ autonlp --version
0.3.1

$ python -V
Python 3.9.5

$ pip -V
pip 21.1.3

SBrandeis commented 3 years ago

Hello @fjarrett !

Thanks for reporting this issue.

The error is most likely raised by the huggingface_hub dependency that we use to interact with HuggingFace's models and datasets hub. More precisely, I suspect the error is raised from this function in AutoNLP, which clones the project's dataset repo on your machine. Indeed, cloning dataset repos is broken in the latest huggingface_hub release (see this issue ).

We just released AutoNLP 0.3.2 that pins an anterior version of the huggingface_hub package (namely, 0.12.0). Would you mind updating AutoNLP and retrying to see if it solves your issue?

pip install -U autonlp
autonlp upload ...

frankiejarrett commented 3 years ago

@SBrandeis Your hunch was right, I upgraded to 0.3.2 and the uploads worked. I am training our project models now. Thank you for the assist!

frankiejarrett commented 3 years ago

@SBrandeis it looks like our training failed, I don't see an error code/message so not sure how to debug. Any chance someone can take a look and see what could be the issue? We have used this exact dataset on Amazon Comprehend in the past.

📁 training_set_out.csv (id # 385)
   • Split:             train
   • Processing status: ✅ Success!
   • Last update:       2021-06-30 13:58 Z
📁 validation_set_out.csv (id # 386)
   • Split:             valid
   • Processing status: ✅ Success!
   • Last update:       2021-06-30 13:59 Z

~~~~~~~~~~~~ Models ~~~~~~~~~~~

+----+--------+--------+--------------------+--------------------+
|    |   ID   | Status |   Creation date    |    Last update     |
+----+--------+--------+--------------------+--------------------+
| ❌ | 302893 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
| ❌ | 302894 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
| ❌ | 302895 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
| ❌ | 302896 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
| ❌ | 302897 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
| ❌ | 302898 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
| ❌ | 302899 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
| ❌ | 302900 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
| ❌ | 302901 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
| ❌ | 302902 | failed | 2021-06-30 14:03 Z | 2021-06-30 14:12 Z |
+----+--------+--------+--------------------+--------------------+

SBrandeis commented 3 years ago

@fjarrett looking into this now!

frankiejarrett commented 3 years ago

@SBrandeis thank you! and sorry for the head fake 😸

SBrandeis commented 3 years ago

I had a look at the parameters of your project - the problem is that you're trying to train a model that's not compatible with 🤗 Transformers' TextClassification pipeline.

You can browse the TextClassification-compatible models here: https://huggingface.co/models?pipeline_tag=text-classification If you want to try out that specific model on your use case, you might want to have a look at 🤗's Inference API if you haven't already: https://huggingface.co/inference-api

Let me know if it helps!

frankiejarrett commented 3 years ago

@SBrandeis I tried again this time using --hub_model textattack/facebook-bart-large-MNLI instead since that model was tagged for TextClassification but the training failed again

frankiejarrett commented 3 years ago

@SBrandeis trying again with --hub_model roberta-large-mnli this time 🤞

abhishekkrthakur commented 3 years ago

That might not work either. For text-classification its best to select models which are not finetuned on a downstream task. Try: roberta-large :)

frankiejarrett commented 3 years ago

@SBrandeis ok trying that now

frankiejarrett commented 3 years ago

@abhishekkrthakur @SBrandeis I tried roberta-large but that also failed. I finally resorted to training without any --hub_model specified and that did work. Could it be that fine-tuning existing hub models is broken right now?

FAILED facebook/bart-large-mnli
FAILED textattack/facebook-bart-large-MNLI
FAILED roberta-large-mnli
FAILED roberta-large

frankiejarrett commented 3 years ago

Oh, it seems I was wrong. The roberta-large run did, in fact, succeed. I must have been looking at the wrong project. Thank you both for your help! I learned a lot. 🙌

huggingface / autotrain-advanced