psychoinformatics-de / knowledge-base

Sources for the psyinf knowledge base
https://knowledge-base.psychoinformatics.de
Other
0 stars 3 forks source link

RIA questions: https write access? create empty store without a dataset? #91

Closed mslw closed 1 year ago

mslw commented 1 year ago

Origin: DataLad matrix channel, thread started 2023-06-19

Discussion is ongoing so won't copy-paste now, but we are discussing the following questions:

  1. Is it possible to have an RIA with https read/write access?
  2. Is it possible to create an empty RIA directly without first making a regular datalad repo?
  3. Is it also possible to create an empty dataset within the RIA

TODO (not necessarily to be performed in this order)

loj commented 1 year ago

The discussion resulted in the following:

  1. Not yet, but "to come" with planned RIA development.
  2. A helper function (create_store) in datalad/customremotes/ria_utils.py was pointed to and the following was presented as an example of how to do it by hand:
adina@muninn in ~
❱ cd /tmp
adina@muninn in /tmp
❱ mkdir -p mockstore/error_logs
adina@muninn in /tmp
❱ echo "1|l" > mockstore/ria-layout-version
(fdm-werkstatt) adina@muninn in /tmp
❱ datalad create some
cd some
create(ok): /tmp/some (dataset)
(fdm-werkstatt) adina@muninn in /tmp
❱ cd some
(fdm-werkstatt) adina@muninn in /tmp/some on git:master
❱ echo 12435 > file && datalad save
add(ok): file (file)                                                            
save(ok): . (dataset)                                                           
action summary:                                                                 
  add (ok: 1)
  save (ok: 1)
(fdm-werkstatt) adina@muninn in /tmp/some on git:master
❱ datalad create-sibling-ria 'ria+file:///tmp/mockstore' -s ria             1 !
[INFO   ] create siblings 'ria' and 'ria-storage' ... 
[INFO   ] Fetching updates for Dataset(/tmp/some) 
update(ok): . (dataset)
update(ok): . (dataset)
[INFO   ] Configure additional publication dependency on "ria-storage" 
configure-sibling(ok): . (sibling)
create-sibling-ria(ok): /tmp/some (dataset)
action summary:  
  configure-sibling (ok: 1)
  create-sibling-ria (ok: 1)
  update (ok: 1)
0.00 [00:01, ?/s]                                                               (fdm-werkstatt) adina@muninn in /tmp/some on git:master
❱ datalad push --to ria
copy(ok): file (file) [to ria-storage...]                                       
publish(ok): . (dataset) [refs/heads/master->ria:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->ria:refs/heads/git-annex [new branch]]                                                                          
                                                                               action summary:                                                                  
  copy (ok: 1)
  publish (ok: 2)

(fdm-werkstatt) adina@muninn in /tmp
❱ datalad clone 'ria+file:///tmp/mockstore#955d30eb-94ee-4c16-ae84-0c034b7b58e5' newclone && cd newclone
(fdm-werkstatt) adina@muninn in /tmp/newclone on git:master
❱ datalad get file                                                          2 !
get(ok): file (file) [from ria-storage...]  
  1. It was asked what the use case might be for this.

Crucially is that the user who generated the RIA might not be a user of the RIA and would not need a local copy of the dataset.

It was suggested to declare git annex dead here in the local copy before pushing, so that the local copy could be removed when no longer necessary, leaving the RIA as the only location of the dataset.

I will create a KBI for question 2.