nilmtk / nilmtk-contrib

Apache License 2.0
114 stars 59 forks source link

Bert not working properly #68

Open Rohitkr1997 opened 2 years ago

Rohitkr1997 commented 2 years ago

Can anyone upload the environment.yml or the versions of keras, tensorflow, nilmtk, nilmtk-contrib as bert requires keras.layers.multi_head_attention and it does not work properly with the versions of keras used after conda installing nilmtk and nilmtk-contrib. upgrading keras and tensorflow causes conflicts after which nilmtk cannot be used.

Rohitkr1997 commented 2 years ago

Or anyone who has working version of bert please can you upload the output of conda list.

paulfrank1997 commented 2 years ago

You need tensorflow2.5.0 or higher version of it. Since keras is already a inner part pf tensorflow2.5.0, you don't need to install keras individually.

Rohitkr1997 commented 2 years ago

I have tried using tensorflow version 2.6.0 but the environment has conflicts which creates problems. Could you please upload your environment.yml file or share the result of conda list so that I have a proper environment where everything works

Rohitkr1997 commented 2 years ago

If you share all the different packages you're using then I could just use your anaconda environment and avoid all the different conflicts that are in my environment.

paulfrank1997 commented 2 years ago

All you need is to uninstall Keras in the original environment, install tensorflow2.5.0 and upgrate hdpy into the latest version. Then you have to make some changes about the import of the modules. For example, you have to change "import keras.XXX" into "import tensorflow.keras.XXX", and change "import keras.layers.XXX" into "import tensorflow.keras.layers".

------------------ 原始邮件 ------------------ 发件人: "nilmtk/nilmtk-contrib" @.>; 发送时间: 2022年4月25日(星期一) 下午3:39 @.>; @.**@.>; 主题: Re: [nilmtk/nilmtk-contrib] Bert not working properly (Issue #68)

If you share all the different packages you're using then I could just use your anaconda environment and avoid all the different conflicts that are in my environment.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

xuuurq commented 2 years ago

@paulfrank1997 Hello, I installed tensorflow 2.5.0, and h5py is currently updated to the latest 3.7.0, but an error is reported after running, prompting ImportError: save_model requires h5py, I would like to know which version of h5py you installed. Thank you.Below is the result of running. `D:\anaconda3\envs\nilmxu\python.exe D:/mywork/nilmtkcontribxu/nilmtk_contrib/disaggregate/fuhefenjie.py 2022-08-09 15:23:30.591863: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-08-09 15:23:30.591978: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-08-09 15:23:32.978920: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll Started training for BERT Joint training for BERT ............... Loading Data for training ................... Loading data for redd dataset 2022-08-09 15:23:32.999158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3050 Laptop GPU computeCapability: 8.6 coreClock: 1.5GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s 2022-08-09 15:23:33.000126: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-08-09 15:23:33.000876: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found 2022-08-09 15:23:33.001617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found 2022-08-09 15:23:33.002352: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found 2022-08-09 15:23:33.003067: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found 2022-08-09 15:23:33.003796: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found 2022-08-09 15:23:33.004519: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found 2022-08-09 15:23:33.005261: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found 2022-08-09 15:23:33.005365: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... Loading building ... 2 Loading data for meter ElecMeterID(instance=2, building=2, dataset='REDD')
Done loading data all meters for this chunk. Dropping missing values ...............BERT partial_fit running............... First model training for fridge 2022-08-09 15:23:35.090127: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-08-09 15:23:35.090573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-08-09 15:23:35.090662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]
Model: "sequential"


Layer (type) Output Shape Param #

conv1d (Conv1D) (None, 99, 16) 80


l_ppool (LPpool) (None, 50, 16) 0


token_and_position_embedding (None, 50, 16, 32) 643168


transformer_block (Transform (None, 50, 16, 32) 10656


flatten (Flatten) (None, 25600) 0


dropout_2 (Dropout) (None, 25600) 0


dense_2 (Dense) (None, 99) 2534499


dropout_3 (Dropout) (None, 99) 0

Total params: 3,188,403 Trainable params: 3,188,403 Non-trainable params: 0


Epoch 1/50 2022-08-09 15:23:46.743583: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) WARNING:tensorflow:Gradients do not exist for variables ['conv1d/kernel:0', 'conv1d/bias:0'] when minimizing the loss. WARNING:tensorflow:Gradients do not exist for variables ['conv1d/kernel:0', 'conv1d/bias:0'] when minimizing the loss. 526/526 [==============================] - 259s 472ms/step - loss: 12.1465 - mse: 12.1465 - val_loss: 0.6670 - val_mse: 0.6670

Epoch 00001: val_loss improved from inf to 0.66698, saving model to BERT-temp-weights-74894.h5 Traceback (most recent call last): File "D:/mywork/nilmtkcontribxu/nilmtk_contrib/disaggregate/fuhefenjie.py", line 54, in api_res = API(experiment) File "D:\anaconda3\envs\nilmxu\lib\site-packages\nilmtk\api.py", line 46, in init self.experiment() File "D:\anaconda3\envs\nilmxu\lib\site-packages\nilmtk\api.py", line 91, in experiment self.train_jointly(clf,d)
File "D:\anaconda3\envs\nilmxu\lib\site-packages\nilmtk\api.py", line 240, in train_jointly clf.partial_fit(self.train_mains,self.train_submeters) File "D:\mywork\nilmtkcontribxu\nilmtk_contrib\disaggregate\bert.py", line 161, in partial_fit model.fit(train_x,train_y,validation_data=(v_x,v_y),epochs=self.n_epochs,callbacks=[checkpoint],batch_size=self.batch_size) File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\engine\training.py", line 1204, in fit callbacks.on_epoch_end(epoch, epoch_logs) File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\callbacks.py", line 410, in on_epoch_end callback.on_epoch_end(epoch, logs) File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\callbacks.py", line 1376, in on_epoch_end self._save_model(epoch=epoch, logs=logs) File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\callbacks.py", line 1428, in _save_model self.model.save(filepath, overwrite=True, options=self._options) File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\engine\training.py", line 2087, in save signatures, options, save_traces) File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\saving\save.py", line 147, in save_model model, filepath, overwrite, include_optimizer) File "D:\anaconda3\envs\nilmxu\lib\site-packages\keras\saving\hdf5_format.py", line 79, in save_model_to_hdf5 raise ImportError('save_model requires h5py.') ImportError: save_model requires h5py. Closing remaining open files:C:\Users\xrq\AppData\Local\Temp\nilmtk-meg927ux.h5...doneD:/works/nilmtkcontrib/nilmtk_contrib/redd_low.hdf5...done `

paulfrank1997 commented 2 years ago

@xuuurq I met the same problem as you did: "ImportError: save_model requires h5py". But after I upgrate hdpy into the latest version by "pip install --upgrade h5py", the problem got solved. The version of h5py I used is 3.6.0, and now everything worked fine.

xuuurq commented 2 years ago

@paulfrank1997 Sorry to bother you again, I think there are a few more questions:

  1. Are you using tensorflow-gpu version 2.5.0?
  2. The bert model in nilmtk-contrib is different from the code in the BERT4NILM paper, which is reflected in the loss function and mask processing. Is the bert model in nilmtk-contrib without mask processing?
  3. In addition, I would like to ask you what do you think of the effect of the bert model in nilmtk-contrib? Thank you very much for your answer.
paulfrank1997 commented 2 years ago

All you need is to uninstall Keras in the original environment, install tensorflow2.5.0 and upgrate h5py into the latest version. Then you have to make some changes about the import of the modules. For example, you have to change "import keras.XXX" into "import tensorflow.keras.XXX", and change "import keras.layers.XXX" into "import tensorflow.keras.layers".

------------------ 原始邮件 ------------------ 发件人: "nilmtk/nilmtk-contrib" @.>; 发送时间: 2022年4月25日(星期一) 下午3:39 @.>; @.**@.>; 主题: Re: [nilmtk/nilmtk-contrib] Bert not working properly (Issue #68)

If you share all the different packages you're using then I could just use your anaconda environment and avoid all the different conflicts that are in my environment.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>