humlab-speech / visible-speech-deployment

3 stars 2 forks source link

The add recording procedure is very slow for larger databases #166

Open FredrikKarlssonSpeech opened 1 year ago

FredrikKarlssonSpeech commented 1 year ago

When the user wants to add a recording to the database, the entire database is checked out into the container before the session is added. That process currently takes a looong time.

FredrikKarlssonSpeech commented 1 year ago

Some solution ideas:

Alternatively, maybe use the git clone --filter=blob:none or --filter=blob:limit= options to not check out signal files. They are not referenced or needed in any way in the checked out database so I suspect that a

  1. partial checkout
  2. add a new session to the database (directory)
  3. add & commit the filed in the new session directory

will work.

FredrikKarlssonSpeech commented 1 year ago

Looked into this further and it seems that one only need to check out these files

VISPDB_emuDB/
VISPDB_emuDB/VISPDB_emuDBcache.sqlite
VISPDB_emuDB/VISPDB_DBconfig.json

and still be able to load it as an emuDB database and issue reindeer::import_recordings on it. If you then git-add the new session (directory and content) then I guess it should work?