Closed pdurbin closed 7 years ago
Non-trivial change; two possible initial approaches:
First approach has intersections w\ security model (transfer user shell not a login shell; but "transfer initialization" is equivalent to receiving successful login for transfer). Second approach might require more infrastructure, but probably won't have these impacts.
Bummer that it's a non-trivial change. I'm not particularly interested in the rsync script talking to Dataverse. It's the rsync script's job to talk to the DCM and I think we should keep it that way. No rush on any of this. It would have been a nice to have for #3942 but we can live without it, I'd say.
A few notes on initial investigations:
rsync
with a script containing a trigger (which is unlikely to work in the context of the non-login shell for the transfer user), or a periodic process checking for transfers that have been started but not yet completed.Obsolete from discussions of https://github.com/IQSS/dataverse/issues/3348 today.
I still think the cron job would be better than nuthin'. 😄
Last week when discussing https://trello.com/c/Nbte37k1/9-rsync-file-upload-%26-download-(4.8) with @mheppler @TaniaSchlatter @dlmurphy we agreed that we'd all like to see the Data Capture Module (DCM) POST some JSON to Dataverse when upload has begun. That is to say, the DCM will recognize that the user has started executing the rsync script and inform Dataverse of this fact. When Dataverse receives the "upload has begun" message (or "uploadHasBegun"?), Dataverse will take some actions, possibly sending a notification to the user, preventing the dataset from being deleted, etc. It would be awfully nice if the DCM could send the number of bytes Dataverse should expect, but this is not a hard requirement.
I believe the issue on the Dataverse side is https://github.com/IQSS/dataverse/issues/3348