Open Violonur-PavelBI opened 2 years ago
@Violonur-PavelBI
Please provide the version of Python and PyTorch.
Python 3.6.9 PyTorch 1.3.0a0+24ae9b5
@Violonur-PavelBI
We found out why the accuracy is so low. This is because only a very small amount of data is used in the fulltrain phase. We are training using the following configuration:
fully_train:
pipe_step:
type: TrainPipeStep
dataset:
ref: pba.dataset
common:
train_portion: 1.0 # Use full training data.
train:
shuffle: True # Shuffle during training
...
@zhangjiajin I launched it with a correction, I understand correctly that such low accuracy is normal at the entire PBA stage?
@Violonur-PavelBI
Yes, the PBA phase requires only relative comparisons, only less data is used, and the accuracy is low.
@zhangjiajin It didn't help `INFO:root:flops: 0.5578890240000001 , params:11173.962
INFO:root:Finished the unified trainer successfully. INFO:root:Update Success. step_name=pba, worker_id=15 INFO:root:Best values: [{'worker_id': 6, 'performance': {'flops': 0.5578890240000001, 'params': 11173.962, 'accuracy': 0.27411400139664804, 'accuracy_top1': 0.27411400139664804, 'accuracy_top5': 0.739830656424581}}] INFO:root:Clean worker folder /workspace/proj/vega/vega/tasks/0628.084138.423/workers/pba. INFO:root:------------------------------------------------ INFO:root: Step: fully_train INFO:root:------------------------------------------------ INFO:vega.core.pipeline.train_pipe_step:init TrainPipeStep... INFO:vega.core.pipeline.train_pipe_step:TrainPipeStep started... INFO:root:Model was created. INFO:root:load model weights from file, weights file=/workspace/proj/vega/vega/tasks/0628.084138.423/output/pba/model_6.pth INFO:root:flops: 0.5578890240000001 , params:11173.962 INFO:root:worker id [6], epoch [1/400], train step [ 0/195], loss [ 1.344, 1.344], lr [ 0.1000000], time pre batch [0.519s] , total mean time per batch [0.519s] ... INFO:root:worker id [6], epoch [400/400], current valid perfs [accuracy: 0.132, accuracy_top1: 0.132, accuracy_top5: 0.505], best valid perfs [accuracy: 0.401, accuracy_top1: 0.401, accuracy_top5: 0.835] INFO:root:flops: 0.5578890240000001 , params:11173.962 INFO:root:Finished the unified trainer successfully. INFO:root:start evaluate process INFO:root:Model was created. INFO:root:load model weights from file, weights file=/workspace/proj/vega/vega/tasks/0628.084138.423/workers/fully_train/6/model_6.pth INFO:root:step [1/39], valid metric [[[tensor(0.3945, device='cuda:0'), tensor(0.8242, device='cuda:0')]]] INFO:root:step [11/39], valid metric [[[tensor(0.3945, device='cuda:0'), tensor(0.8398, device='cuda:0')]]] INFO:root:step [21/39], valid metric [[[tensor(0.4102, device='cuda:0'), tensor(0.8555, device='cuda:0')]]] INFO:root:step [31/39], valid metric [[[tensor(0.3984, device='cuda:0'), tensor(0.8086, device='cuda:0')]]] INFO:root:evaluator latency [4.54831620445475] INFO:root:evaluate performance: {'accuracy': 0.4011418269230769, 'accuracy_top1': 0.4011418269230769, 'accuracy_top5': 0.835136217948718, 'latency': 4.54831620445475} INFO:root:finished host evaluation, id: 6, performance: {'accuracy': 0.4011418269230769, 'accuracy_top1': 0.4011418269230769, 'accuracy_top5': 0.835136217948718, 'latency': 4.54831620445475} INFO:root:------------------------------------------------ INFO:root: Pipeline end. INFO:root: INFO:root: task id: 0628.084138.423 INFO:root: output folder: /workspace/proj/vega/vega/tasks/0628.084138.423/output INFO:root: INFO:root: running time: INFO:root: pba: 3:07:25 [2022-06-28 08:41:40.687438 - 2022-06-29 11:49:06.298678] INFO:root: fully_train: 3:30:13 [2022-06-29 11:49:06.388008 - 2022-06-29 15:19:19.973391] INFO:root: INFO:root: result: INFO:root: 6: {'flops': 0.5578890240000001, 'params': 11173.962, 'accuracy': 0.4011418269230769, 'accuracy_top1': 0.4011418269230769, 'accuracy_top5': 0.835136217948718, 'latency': 4.54831620445475} INFO:root:------------------------------------------------`
@Violonur-PavelBI
Copy that. I'll find the cause of the issue.
Copy what exactly?
I mean, after modifying the configuration file, the accuracy is still not good. I will find the cause of the issue. We've found some clues. After the data augmentation method is changed during the training, the previous model may not be correctly loaded to continue the training.
Thank you very much
Thanks for your impressive work. As a result of running the PBA, I got very low results result: INFO:root: 15: {'flops': 0.556660224, 'params': 11173.962, 'accuracy': 0.4115953947368421, 'accuracy_top1': 0.4115953947368421, 'accuracy_top5': 0.8470394736842105, 'latency': 6.360976584255695}
Can you help in solving this problem
aiohttp 3.8.1 aiosignal 1.2.0 albumentations 1.1.0 alembic 1.7.7 anyio 3.5.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 async-timeout 4.0.2 asynctest 0.13.0 attrs 21.4.0 Babel 2.9.1 backcall 0.2.0 beautifulsoup4 4.10.0 bleach 4.1.0 bokeh 2.4.2 brotlipy 0.7.0 certifi 2021.10.8 cffi 1.14.6 chardet 4.0.0 charset-normalizer 2.0.12 click 8.0.4 cloudpickle 2.0.0 conda 4.10.3 conda-build 3.21.5 conda-package-handling 1.7.3 cryptography 3.4.8 cycler 0.11.0 dask 2022.2.0 databricks-cli 0.16.4 debugpy 1.5.1 decorator 5.1.0 defusedxml 0.7.1 dill 0.3.5.1 distributed 2022.2.0 dnspython 2.1.0 docker 5.0.3 entrypoints 0.4 filelock 3.0.12 Flask 2.0.3 fonttools 4.30.0 frozenlist 1.3.0 fsspec 2022.2.0 future 0.18.2 gitdb 4.0.9 GitPython 3.1.27 glob2 0.7 googledrivedownloader 0.4 greenlet 1.1.2 gunicorn 20.1.0 HeapDict 1.0.1 idna 2.10 imageio 2.16.1 importlib-metadata 4.11.3 importlib-resources 5.4.0 ipykernel 6.9.2 ipython 7.27.0 ipython-genutils 0.2.0 itsdangerous 2.1.1 jedi 0.18.0 Jinja2 3.0.3 joblib 1.1.0 json5 0.9.6 jsonschema 4.4.0 jupyter-client 7.1.2 jupyter-core 4.9.2 jupyter-server 1.15.6 jupyter-server-proxy 3.2.1 jupyterlab 3.3.2 jupyterlab-pygments 0.1.2 jupyterlab-server 2.11.1 kiwisolver 1.4.0 libarchive-c 2.9 locket 0.2.1 Mako 1.2.0 MarkupSafe 2.0.1 matplotlib 3.5.1 matplotlib-inline 0.1.2 mistune 0.8.4 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 mlflow 1.24.0 msgpack 1.0.3 multidict 6.0.2 nbclassic 0.3.7 nbclient 0.5.13 nbconvert 6.4.4 nbformat 5.2.0 nest-asyncio 1.5.4 networkx 2.6.3 noah-vega 1.8.4 notebook 6.4.10 notebook-shim 0.1.0 numpy 1.21.2 ofa 0.1.0.post202111231444 olefile 0.46 onnx 1.11.0 opencv-contrib-python 4.5.5.64 opencv-python 4.5.5.64 opencv-python-headless 4.5.5.64 packaging 21.3 pandas 1.3.5 pandocfilters 1.5.0 parso 0.8.2 partd 1.2.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.4.0 pip 22.1.2 pkginfo 1.7.1 prometheus-client 0.13.1 prometheus-flask-exporter 0.19.0 prompt-toolkit 3.0.20 protobuf 3.19.4 psutil 5.8.0 ptyprocess 0.7.0 pycosat 0.6.3 pycparser 2.20 Pygments 2.10.0 pyOpenSSL 20.0.1 pyparsing 3.0.7 pyrsistent 0.18.1 PySocks 1.7.1 python-dateutil 2.8.2 python-etcd 0.4.5 pytz 2021.3 PyWavelets 1.3.0 PyYAML 5.4.1 pyzmq 22.3.0 qudida 0.0.4 querystring-parser 1.2.4 requests 2.25.1 ruamel-yaml-conda 0.15.100 scikit-image 0.19.2 scikit-learn 1.0.2 scipy 1.7.3 seaborn 0.11.2 Send2Trash 1.8.0 setuptools 58.0.4 simpervisor 0.4 six 1.16.0 smmap 5.0.0 sniffio 1.2.0 sortedcontainers 2.4.0 soupsieve 2.2.1 SQLAlchemy 1.4.32 sqlparse 0.4.2 tabulate 0.8.9 tblib 1.7.0 tensorboardX 2.5 terminado 0.13.3 testpath 0.6.0 thop 0.1.0.post2206102148 threadpoolctl 3.1.0 tifffile 2021.11.2 toolz 0.11.2 torch 1.4.0 torchtext 0.11.0 torchvision 0.5.0 tornado 6.1 tqdm 4.61.2 traitlets 5.1.0 typing-extensions 3.10.0.2 urllib3 1.26.6 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 1.3.1 Werkzeug 2.0.3 wheel 0.36.2 yarl 1.7.2 zict 2.1.0 zipp 3.7.0