rug-compling / alpinocorpus

Library for handling Alpino corpora
GNU Lesser General Public License v2.1
8 stars 1 forks source link

Provide an interface for discovering corpora #28

Closed danieldk closed 11 years ago

danieldk commented 12 years ago

For non-local corpora, there is currently no interface for discovering which corpora are available. For RemoteCorpusReader, this is done in Dact. It would be better to have a general interface for remote corpora.

jelmervdl commented 12 years ago

Currently the workflow of opening a remote corpus is different to opening a corpus from the web. The latter needs to be downloaded and saved first, a one-time action. After that, it needs to be opened directly, not through the discovery interface.

One way to make the discovery interface would be to take it all to the web, and use urls to let Dact open remote corpora and otherwise force the user to download the .dact file. But still there is a different work flow for the second time the user wants to open the corpus.

We could also let Dact act more iTunes-like, i.e. let it manage the downloaded corpora for the user. For example: the user tries to open the Wikipedia corpus through the discovery interface, but it isn't "installed" yet. Dact downloads the corpus and stores it somewhere hidden from the user. As soon as downloading has finished, the corpus is opened. The next time the user opens Dact and selects the Wikipedia corpus, Dact finds it is already installed and opens it directly. For the remote corpora the workflow is the same: the user selects a corpus, and (after some waiting) Dact opens (or connects to) the corpus.

Regarding the discovery interface. I think a website is nicer than a dialog inside Dact itself. From the website we can offer Dact for download for the users that do not yet have Dact installed. And people can link to (the webpage of) a corpus.

edit: I just realized this is more related to Dact than to Alpinocorpus. Still, I do think the discovery should happen in Dact, not inside alpinocorpus.