nlp-with-transformers / notebooks

Jupyter notebooks for the Natural Language Processing with Transformers book
https://transformersbook.com/
Apache License 2.0
3.81k stars 1.17k forks source link

The `to_tf_dataset` is missing `collate_fn` argument. #42

Open lvwerra opened 2 years ago

lvwerra commented 2 years ago

Information

The problem arises in chapter:

Describe the bug

The to_tf_dataset throws an error TypeError: to_tf_dataset() missing 1 required positional argument: 'collate_fn'. This could be a version issue.

cc @lewtun @cakiki

cakiki commented 2 years ago

For clarification: I was not using the repo (and therefore not using the pinned requirements), I just had the book open and typing along into an environment I had with the following:

- `transformers` version: 4.18.0
- Platform: Linux-4.15.0-176-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.5.1
- PyTorch version (GPU?): not installed (NA)
- Tensorflow version (GPU?): 2.9.0-rc0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no

I think to_tf_dataset is currently being revamped, so definitely a version issue :smiley_cat: (https://github.com/huggingface/datasets/pull/4170)

Rocketknight1 commented 2 years ago

I don't think this is a version issue! We are revamping that function, but I believe the DataCollator should be passed to the collate_fn argument either way.

Rocketknight1 commented 2 years ago

That said, the code might work without a collator after the current PR is merged, because it includes a minimal default collator.