pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.13k stars 151 forks source link

NameError: name 'portalocker' is not defined #1093

Closed afurkank closed 1 year ago

afurkank commented 1 year ago

🐛 Describe the bug

When trying to create an iterator from Multi30k dataset with the following code:

!pip install torch==2.0.0 torchtext==0.15.1
!pip install torchdata==0.6.0
import torch
import torchtext
from torchtext.datasets import Multi30k

SRC_LANGUAGE = 'de'
TGT_LANGUAGE = 'en'

train_iter = Multi30k(split='train', language_pair=(SRC_LANGUAGE, TGT_LANGUAGE))

I get the following error:

ModuleNotFoundError: Package 'portalocker' is required to be installed to use this datapipe. Please use 'pip install 'portalocker>=2.0.0'' or 'conda install -c conda-forge 'portalocker>=2/0.0' to install the package

I then installed the said package like this: !pip install 'portalocker>=2.0.0'

Which solved the issue. And when I tried to iterate over train_iter:

for i in train_iter:
    print("hello world")

I got this error:

NameError: name 'portalocker' is not defined
This exception is thrown by __iter__ of _MemoryCellIterDataPipe(remember_elements=1000, source_datapipe=_ChildDataPipe)

None of this happens with version 0.5.1 of torchdata.

Versions

Collecting environment information... PyTorch version: 2.0.0+cu117 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.22.6 Libc version: glibc-2.31

Python version: 3.9.16 (main, Dec 7 2022, 01:11:51) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.147+-x86_64-with-glibc2.31 Is CUDA available: False CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 48 bits physical, 48 bits virtual CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7B12 Stepping: 0 CPU MHz: 2250.000 BogoMIPS: 4500.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32 KiB L1i cache: 32 KiB L2 cache: 512 KiB L3 cache: 16 MiB NUMA node0 CPU(s): 0,1 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save umip rdpid

Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==2.0.0 [pip3] torchaudio==0.13.1+cu116 [pip3] torchdata==0.6.0 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.15.1 [pip3] torchvision==0.14.1+cu116 [pip3] triton==2.0.0 [conda] Could not collect

ejguan commented 1 year ago

What I found: A typo on the registry at https://github.com/pytorch/data/blob/7d9747588173f38e3ce0d5c58ba3746dacb4438f/torchdata/datapipes/iter/util/cacheholder.py#L22

But I still cannot reproduce this problem. Can you pls verify if portalocker is installed properly.

afurkank commented 1 year ago

I'm working on Google Colab, that is where I got the error. I'm linking the notebook:

https://colab.research.google.com/drive/1ZXEoDtGYOToHU5KBXmfkxs7Bcsvyhfm0?usp=sharing

ejguan commented 1 year ago

I'm working on Google Colab, that is where I got the error. I'm linking the notebook:

https://colab.research.google.com/drive/1ZXEoDtGYOToHU5KBXmfkxs7Bcsvyhfm0?usp=sharing

@afurkank For colab, you need to restart the runtime to achieve so. Simply speaking, you can do pip install portalocker in the first cell. Then, it should work properly with the rest of code.

ejguan commented 1 year ago

I ma closing this Issue now. Let me know if it doesn't work.

enchyisle commented 1 year ago

Hello, I can also reproduce this error, with torch 2.0, torchtext 1.15.1 and torchdata 0.6.0 on python 3.8.10. I was trying to just run the example from: https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html, when the third line went wrong. train_iter = iter(AG_NEWS(split='train'))

I tried also to explictly pip install portalocker, but yet another error popped out. So I switched back to torchdata 0.5.1 where everything worked fine.

Could you help me understand if there's anything I'm doing wrong? Thanks.

ejguan commented 1 year ago

@enchyisle Could you pls give a try with the nightly release? You can find how to install nightly release in https://pytorch.org/get-started/locally/. We recently land a change, which might fix the problem.

Hqscesz commented 1 year ago

I'm working on Google Colab, that is where I got the error. I'm linking the notebook: https://colab.research.google.com/drive/1ZXEoDtGYOToHU5KBXmfkxs7Bcsvyhfm0?usp=sharing

@afurkank For colab, you need to restart the runtime to achieve so. Simply speaking, you can do pip install portalocker in the first cell. Then, it should work properly with the rest of code.

very useful idea thanks a lot

Aisuko commented 1 year ago

The nightly release is working for me on my Kaggle environment

pip install portalocker
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118