Open simonw opened 2 years ago
Uses the detect dominant languages API to calculate the likely language (or languages) of each row:
comprehend.batch_detect_dominant_language(TextList=["Hello", "hola", "."])
{'ResultList': [{'Index': 0, 'Languages': [{'LanguageCode': 'en', 'Score': 0.9982954263687134}]}, {'Index': 1, 'Languages': [{'LanguageCode': 'es', 'Score': 0.983553409576416}]}, {'Index': 2, 'Languages': [{'LanguageCode': 'en', 'Score': 0.009999999776482582}]}], 'ErrorList': [], 'ResponseMetadata': {'RequestId': 'fdffbc8d-89f5-400d-a283-ac7adfbe2e2d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'fdffbc8d-89f5-400d-a283-ac7adfbe2e2d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '257', 'date': 'Fri, 08 Jul 2022 18:13:10 GMT'}, 'RetryAttempts': 0}}
Schema design:
CREATE TABLE [pages_comprehend_languages] ( [id] INTEGER REFERENCES [pages]([id]), [language] TEXT, [score] FLOAT );
Uses the detect dominant languages API to calculate the likely language (or languages) of each row:
{'ResultList': [{'Index': 0, 'Languages': [{'LanguageCode': 'en', 'Score': 0.9982954263687134}]}, {'Index': 1, 'Languages': [{'LanguageCode': 'es', 'Score': 0.983553409576416}]}, {'Index': 2, 'Languages': [{'LanguageCode': 'en', 'Score': 0.009999999776482582}]}], 'ErrorList': [], 'ResponseMetadata': {'RequestId': 'fdffbc8d-89f5-400d-a283-ac7adfbe2e2d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'fdffbc8d-89f5-400d-a283-ac7adfbe2e2d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '257', 'date': 'Fri, 08 Jul 2022 18:13:10 GMT'}, 'RetryAttempts': 0}}