ngrams-dev / general

NGRAMS is a search engine for the Google Books Ngram Dataset. This repository contains documentation, discussions, announcements, and issues.
https://ngrams.dev
14 stars 0 forks source link

Add endpoint to fetch total counts #3

Open mtrenkmann opened 1 year ago

mtrenkmann commented 1 year ago

Each corpus in the dataset contains so-called totalcounts files (for example see first download link here). These files contain absolute total match counts by ngram length and year. NGRAMS uses this data internally to assign a relative total match count to each ngram. The data should also be useful to API users who need to derive other forms of probabilities.

This feature request proposes to make totalcounts available via REST API by introducing new endpoints like:

GET {base_url}/{corpus}/totalcounts/{ngram_length}
GET {base_url}/{corpus}/totalcounts/{ngram_length}/{year}
GET {base_url}/{corpus}/totalcounts/{ngram_length}/allyears

The data should be cached on client side once fetched as it is static.