google / patents-public-data

Patent analysis using the Google Patents Public Datasets on BigQuery
https://bigquery.cloud.google.com/dataset/patents-public-data:patents
Apache License 2.0
539 stars 163 forks source link

ResourceExhaustedError while running Document_representation_from_BERT #51

Open MahajanTarun opened 3 years ago

MahajanTarun commented 3 years ago

I have been trying to run Document_representation_from_BERT on local machine with enough memory, i.e. 8 GB RAM. All the other TF function runs without this error for other notebooks on my local machine.

But when I load the Patent_BERT model, i.e. model = tf.compat.v2.saved_model.load(export_dir=MODEL_DIR, tags=['serve']) model = model.signatures['serving_default']

It also gives the similar error at : docs_embeddings = [] for _, row in df.iterrows(): inputs = get_bert_token_input(row['claims']) response = model(**inputs) avg_embeddings = pooling( tf.reshape(response['encoder_layer'], shape=[1, -1, 1024])) docs_embeddings.append(avg_embeddings.numpy()[0])

Please help me to get this done. I have already spent a lot of time solving the issue, but to no avail.

twin9458 commented 2 years ago

I have the same issue with that. Especially, when I run _response = model(inputs)_** there is an error. NameError : name 'response' is not defined.

My code is _model = tf.compat.v2.saved_model.load(export_dir=MODEL_DIR, tags=['serve']) model = model.signatures['serving_default'] inputs = get_bert_token_input(example_sent) response = model(inputs)_**

Is there any other solution? I have to solve this... but I can't .... error

twin9458 commented 2 years ago

Hi, I have

I have been trying to run Document_representation_from_BERT on local machine with enough memory, i.e. 8 GB RAM. All the other TF function runs without this error for other notebooks on my local machine.

But when I load the Patent_BERT model, i.e. model = tf.compat.v2.saved_model.load(export_dir=MODEL_DIR, tags=['serve']) model = model.signatures['serving_default']

It also gives the similar error at : docs_embeddings = [] for _, row in df.iterrows(): inputs = get_bert_token_input(row['claims']) response = model(**inputs) avg_embeddings = pooling( tf.reshape(response['encoder_layer'], shape=[1, -1, 1024])) docs_embeddings.append(avg_embeddings.numpy()[0])

Please help me to get this done. I have already spent a lot of time solving the issue, but to no avail.

Hi, I don't know if I can help, but if you have the same problem as me, try this.

The PatentBERT model uses averagepooling, but I misunderstood the part of averaging. In my case, I had to find a sentence vector for each input sentence, so I changed it to a list form. You should refer to line 68. I will also attach information about the data I entered.

I hope this helps 😥 Good day ! 캡쳐1

캡쳐2