langcog / wordbank

open repository of children's vocabulary data
http://wordbank.stanford.edu
GNU General Public License v2.0
64 stars 10 forks source link

New DB instance for BiLExpo #325

Open alvinwmtan opened 2 months ago

alvinwmtan commented 2 months ago

Hi @HenryMehta, we're starting up a new project involving bilingual data, and we are planning to host the data in a way that project members can access it. However, some of the data are under embargo, and thus we cannot use the public GitHub issues or the public version of the database to host these data.

Thus, our proposed solution is as follows:

  1. Construct a new instance of the Wordbank database with the same structure, but without the current Wordbank data.
  2. Import the new bilingual data (the same way we currently do import); we will need to have a different way of getting these data over to you for import.
  3. Create a new reader account for this database (with different credentials from the current accounts for Wordbank) so that members can access the data; these account details will only be shared among project members.

Does this make sense as a solution? How easy would it be to set this up? (The earliest we will need this is around end-October.) Thanks!

HenryMehta commented 2 months ago

@alvinwmtan

I can create a new instance of Wordbank (we will need a URL although we could use the AWS generated one), and load the new data only. I can also use different password credentials from the current one.

How does the Shiny side work?

alvinwmtan commented 2 months ago

Great! I don't think we'll need Shiny for this—we won't be making visualisation applications for it!

One more thing to flag—the new data will eventually make its way into Wordbank, just after the embargo period is over. So we will need to reimport the data into the main database, but that shouldn't be too difficult.

HenryMehta commented 2 months ago

@alvinwmtan How are you going to pass me the datasets?

HenryMehta commented 2 months ago

Also, is it just the database you want or do you want the website?

alvinwmtan commented 2 months ago

Just the database, no need for the website.

For the data, we plan to have it on Google Drive once we've actually processed the data—perhaps I could send you the link in an email when we have the data ready?

HenryMehta commented 2 months ago

that works - great

HenryMehta commented 1 month ago

@alvinwmtan Any update on this? I ask because you originally said by November which only leaves 2 weeks

alvinwmtan commented 1 month ago

@HenryMehta thanks for the bump; we haven't processed the data for import yet so it will still be a while.