Open sthorpe11 opened 3 years ago
The model to generate embedding_v1 has not been released, and we also haven't released pre-embedded patents with the BERT model in BigQuery.
You could experiment with learning a mapping from BERT to embedding_v1 with a linear layer - they should match up well because they're both based on text. embedding_v1 is a set-of-words unigram model.
Can you give some insight into how you dealt with limited window size for BERT? Eg did you choose between abstract/patent/etc; Pool things? Something else?
Hi Saurabh,
We limited the window to claim 1.
Scott
From: Saurabh Bhatnagar @.> Sent: Thursday, December 2, 2021 1:43 PM To: google/patents-public-data @.> Cc: sthorpe11 @.>; Author @.> Subject: Re: [google/patents-public-data] BERT for Patents yields 1024 element array, but embedding_v1 is 64 element (#49)
Can you give some insight into how you dealt with limited window size for BERT? Eg did you choose between abstract/patent/etc; Pool things? Something else?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgoogle%2Fpatents-public-data%2Fissues%2F49%23issuecomment-984986213&data=04%7C01%7C%7Cf1a7f1780d5b4165a2f008d9b5d46c2f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637740746225068084%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2FAgVn6tHvK3T6%2BrbI2mBb3riU85pZ1dlXbK2dzRDpIg%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGKNX4UQBHUJKWIQ25X6AV3UO7K7ZANCNFSM47STZ6JQ&data=04%7C01%7C%7Cf1a7f1780d5b4165a2f008d9b5d46c2f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637740746225078042%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=p8ICdI2HV4Yjx3vLe9NHwIYMvpz7xmO6VYcby0jbHjM%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7Cf1a7f1780d5b4165a2f008d9b5d46c2f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637740746225078042%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=AppXLzUaP0L2Q3kdkfI9Iy325o3quxrPDpTY3hNlS5E%3D&reserved=0 or Androidhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7Cf1a7f1780d5b4165a2f008d9b5d46c2f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637740746225078042%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PnJPsbAEHEag5VJIDBMrg82aGdMVMLe%2FcCYnwD3kftE%3D&reserved=0.
Thanks for that quick response. This repo is a great resource.
This repo is great. Thank you! Any plans to release the model that generated embedding_v1 or the BERT pre-embedded patents?
How should I generate an embedding equivalent to embedding_v1? BERT for Patents generates a 1024 element embedding, but the embedding_v1 is a 64 element embedding.