Open fuhailin opened 5 years ago
I would suspect this is a version issue. I would verify that you have installed the correct version of pytorch and are using python 2 (I can see you are using python 3 here).
I was able to reproduce results. Here is my conda evnironment.yaml
file @fuhailin
Create the conda env with:
conda env create -f environment.yml
name: fashion-compat
channels:
- pytorch
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- blas=1.0=mkl
- ca-certificates=2019.5.15=1
- certifi=2019.6.16=py27_1
- cffi=1.12.3=py27h2e261b9_0
- cuda80=1.0=h205658b_0
- cudatoolkit=8.0=3
- cudnn=6.0.21=cuda8.0_0
- freetype=2.9.1=h8a8886c_1
- intel-openmp=2019.4=243
- jpeg=9b=h024ee3a_2
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.2.1=hd88cf55_4
- libgcc=7.2.0=h69d50b8_2
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran=3.0.0=1
- libgfortran-ng=7.3.0=hdf63c60_0
- libpng=1.6.37=hbc83047_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- libtiff=4.0.10=h2733197_2
- mkl=2017.0.4=h4c4d0af_0
- nccl=1.3.4=cuda8.0_1
- ncurses=6.1=he6710b0_1
- numpy=1.13.3=py27ha266831_3
- olefile=0.46=py27_0
- openssl=1.1.1c=h7b6447c_1
- pillow=6.1.0=py27h34e0f95_0
- pip=19.2.2=py27_0
- pycparser=2.19=py27_0
- python=2.7.16=h8b3fad2_4
- pytorch=0.1.12=py27cuda8.0cudnn6.0_1
- readline=7.0=h7b6447c_5
- scikit-learn=0.18.2=np113py27_0
- scipy=0.19.1=np113py27_0
- setuptools=41.0.1=py27_0
- six=1.12.0=py27_0
- sqlite=3.29.0=h7b6447c_0
- tk=8.6.8=hbc83047_0
- torchvision=0.1.8=py27_0
- wheel=0.33.4=py27_0
- xz=5.2.4=h14c3975_4
- zlib=1.2.11=h7b6447c_3
- zstd=1.3.7=h0b5b093_0
prefix: /home/chammika/.conda/envs/fashion-compat
Actually, I could only do the infrequence with the trained model. When training I get NaNs.
@BryanPlummer could you please share the package/versions of the environment used to train the model with pip freeze
or conda env export
. Thank you.
Train Epoch: 1 [0/686851] Loss: 0.3000 (0.3000) Acc: 0.00% (0.00%) Emb_Norm: 0.75 (0.75)
Train Epoch: 1 [64000/686851] Loss: 0.0000 (0.0012) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Train Epoch: 1 [128000/686851] Loss: 0.0000 (0.0006) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Train Epoch: 1 [192000/686851] Loss: 0.0000 (0.0004) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Train Epoch: 1 [256000/686851] Loss: 0.0000 (0.0003) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Train Epoch: 1 [320000/686851] Loss: 0.0000 (0.0002) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Train Epoch: 1 [384000/686851] Loss: 0.0000 (0.0002) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Train Epoch: 1 [448000/686851] Loss: 0.0000 (0.0002) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Train Epoch: 1 [512000/686851] Loss: 0.0000 (0.0001) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Train Epoch: 1 [576000/686851] Loss: 0.0000 (0.0001) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Train Epoch: 1 [640000/686851] Loss: 0.0000 (0.0001) Acc: 0.00% (0.00%) Emb_Norm: nan (nan)
Some packages are not required for this repo, but this should do it.
backports-abc==0.5 backports.functools-lru-cache==1.4 certifi==2017.7.27.1 cffi==1.10.0 chardet==3.0.4 conda==4.3.16 cycler==0.10.0 Cython==0.27 easydict==1.7 enum34==1.1.6 h5py==2.7.1 idna==2.5 matplotlib==2.1.1 mpmath==0.19 nltk==3.2.5 numpy==1.13.1 olefile==0.44 opencv-python==3.3.0.10 Pillow==4.2.1 pkg-resources==0.0.0 pycocotools==2.0 pycosat==0.6.1 pycparser==2.18 pyparsing==2.2.0 python-dateutil==2.6.1 pytz==2017.3 PyYAML==3.12 pyzmq==16.0.2 requests==2.18.3 ruamel.ordereddict==0.4.13 ruamel.yaml==0.15.34 scikit-learn==0.19.0 scipy==0.19.1 singledispatch==3.4.0.3 six==1.10.0 sklearn==0.0 subprocess32==3.2.7 sympy==1.1.1 torch==0.1.12.post2 torchvision==0.1.9 tornado==4.5.1 urllib3==1.22 visdom==0.1.5
Just use encoding='latin1' if ur using torch>0.4
I try to use the pre-trained model, but get the following error:
=> loading checkpoint '/home/fuhailin/runs/nondisjoint_l2norm/model_best.pth.tar' Traceback (most recent call last): File "main.py", line 312, in <module> main() File "main.py", line 138, in main checkpoint = torch.load(args.resume) File "/home/fuhailin/apps/anaconda3/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 387, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/home/fuhailin/apps/anaconda3/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 574, in _load result = unpickler.load() UnicodeDecodeError: 'ascii' codec can't decode byte 0xbe in position 2: ordinal not in range(128)