google / corpuscrawler

Crawler for linguistic corpora
Other
190 stars 56 forks source link

Use available sentences corpora for Wikipedia (290+ languages) #92

Open hugolpz opened 6 months ago

hugolpz commented 6 months ago

There are ready-to-download open licence Wikipedia corpora available.

Project introduction Type Languages (2024) Portal all Language specific Download link Comments
Wortschatz by Leipzig Sentences
Monolingual
290+ - bre bre 100k sentences (2021) List of sentences corpora : API reference > https://api.wortschatz-leipzig.de/ws/corpora