huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.76k stars 26.95k forks source link

HfArgumentParser does not match exact arguments #28701

Closed ahmedkooli closed 8 months ago

ahmedkooli commented 9 months ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

While running examples/pytorch/text-classification/run_classification.py, I noticed that the argument parser does not match the exact keywords, but rather a substring of the expected keyword. For example, running:

python run_classification.py \
    --model_name_or_path  bert-base-uncased \
    --dataset_name glue \
    --dataset_config_name mrpc \
    --shuffle_train_dataset \
    --max_train_samples 20 \
    --max_eval_samples 20 \
    --metric_name accuracy \
    --text_column_na "sentence1,sentence2" \
    --do_train \
    --do_eval \
    --do_predict \
    --max_seq_length 512 \
    --per_device_train_batch_size 32 \
    --learning_rate 2e-5 \
    --num_train_epochs 1 \
    --output_dir /tmp/glue_mrpc/ \
    --overwrite_output_dir \

works, whereas the argument text_column_na doesn't exist, and it replaces text_column_names. Is this meant to be? I think this can lead to unexpected behaviours. Thanks in advance.

Expected behavior

I expected an error due to a non existing keyword, such as:

raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--text_column_na', 'sentence1,sentence2']
amyeroberts commented 9 months ago

Hi @ahmedkooli, thanks for raising this issue!

Digging into this, it seems this is coming from argparser itself, and is due to this line. As the option text_column_names starts with text_column_na and it's selected.

For example - just using argparser:

>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', default=None)
>>> parser.parse_args(['--fo', 'a'])
Namespace(foo='a')
ahmedkooli commented 9 months ago

Thanks for the answer :) Would you suggest trying to enforce it in the transformers library or should I take the discussion up in the python repo directly?

amyeroberts commented 9 months ago

@ahmedkooli As it's not something that we've encountered causing issues for our users, enforcing this isn't a feature we'd add at the moment. If it becomes something that is requested by many people (I'll measure as 👍 on this comment) or is a common pain point then we can revisit.

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.