The `to_tf_dataset` is missing `collate_fn` argument.

lvwerra commented 2 years ago

Information

The problem arises in chapter:

[ ] Introduction
[x] Text Classification
[ ] Transformer Anatomy
[ ] Multilingual Named Entity Recognition
[ ] Text Generation
[ ] Summarization
[ ] Question Answering
[ ] Making Transformers Efficient in Production
[ ] Dealing with Few to No Labels
[ ] Training Transformers from Scratch
[ ] Future Directions

Describe the bug

The to_tf_dataset throws an error TypeError: to_tf_dataset() missing 1 required positional argument: 'collate_fn'. This could be a version issue.

cc @lewtun @cakiki

cakiki commented 2 years ago

For clarification: I was not using the repo (and therefore not using the pinned requirements), I just had the book open and typing along into an environment I had with the following:

- `transformers` version: 4.18.0
- Platform: Linux-4.15.0-176-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.5.1
- PyTorch version (GPU?): not installed (NA)
- Tensorflow version (GPU?): 2.9.0-rc0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no

I think to_tf_dataset is currently being revamped, so definitely a version issue :smiley_cat: (https://github.com/huggingface/datasets/pull/4170)

Rocketknight1 commented 2 years ago

I don't think this is a version issue! We are revamping that function, but I believe the DataCollator should be passed to the collate_fn argument either way.

Rocketknight1 commented 2 years ago

That said, the code might work without a collator after the current PR is merged, because it includes a minimal default collator.

nlp-with-transformers / notebooks

The `to_tf_dataset` is missing `collate_fn` argument. #42

Information

Describe the bug