simonw / sqlite-comprehend

Tools for running data in a SQLite database through AWS Comprehend
Apache License 2.0
6 stars 0 forks source link

`sqlite-comprehend languages` command #6

Open simonw opened 2 years ago

simonw commented 2 years ago

Uses the detect dominant languages API to calculate the likely language (or languages) of each row:

comprehend.batch_detect_dominant_language(TextList=["Hello", "hola", "."])

{'ResultList': [{'Index': 0, 'Languages': [{'LanguageCode': 'en', 'Score': 0.9982954263687134}]}, {'Index': 1, 'Languages': [{'LanguageCode': 'es', 'Score': 0.983553409576416}]}, {'Index': 2, 'Languages': [{'LanguageCode': 'en', 'Score': 0.009999999776482582}]}], 'ErrorList': [], 'ResponseMetadata': {'RequestId': 'fdffbc8d-89f5-400d-a283-ac7adfbe2e2d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'fdffbc8d-89f5-400d-a283-ac7adfbe2e2d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '257', 'date': 'Fri, 08 Jul 2022 18:13:10 GMT'}, 'RetryAttempts': 0}}

simonw commented 2 years ago

Schema design:

CREATE TABLE [pages_comprehend_languages] (
   [id] INTEGER REFERENCES [pages]([id]),
   [language] TEXT,
   [score] FLOAT
);