Upload files to Zenodo tagged with DOI

MichaelCurrie commented 8 years ago

Currently the files are stored locally at the Brown lab, on a Linux server (@ver228 please confirm).

I won't have access privileges but nevertheless I should be able to come up with an upload script.

An open question I'll answer is whether each file gets a Digital Object Identifier (DOI), or if the upload as whole gets a DOI and the individual files are indexed by some subindex.

http://zenodo.org/dev

ver228 commented 8 years ago

I do not understand why the operative system of the storage server is important. We can access the data from the local network either using a local computer or using VPN. In the lab we have Mac and Windows machines. I am sure I can get Linux if it is required, but OSX should be able to handle most of Linux tools.

MichaelCurrie commented 8 years ago

The operating system matters because I am interested in writing a script to upload the files, so I need to know if I should be writing a script in Bash (Linux or Mac potentially), or Powershell (Windows). I agree OSX should be able to handle Linux-type scripts if they are written in a portable manner. Thanks!

ver228 commented 8 years ago

Ok, now it makes sense. We can access to the data using either a Windows or a Mac machine. I would probably prefer a bash script. Maybe you could take a look to port of the Zenodo's API (http://zenodio.lsst.io/en/latest/). It seems to be functional.

MichaelCurrie commented 8 years ago

OK I have been playing with Zenodo's API:

sudo apt-get install python3-pip
sudo pip3 install zenodio

I also found this great guide on how to use Zenodo to assign a DOI to a GitHub repository:

https://guides.github.com/activities/citable-code/

As a test I manually uploaded an old preprint of mine so it would get assigned a DOI, which seems to have worked:

I will keep you updated, working on this now.

MichaelCurrie commented 8 years ago

The native Zenodo API is REST-based: https://zenodo.org/dev

A programmer with the Large Synoptic Survey Telescope, Jonathan Sick, created a 300-line Python wrapper on the part of the API concerned with harvesting metadata from existing Zenodo communities (http://zenodio.lsst.io/en/latest/). However, I have confirmed with Jonathan that no upload function exists or is currently in development..

MichaelCurrie commented 8 years ago

I have created a Zenodo "community" for uploading worm files:

https://zenodo.org/collection/user-tracker-commons

I also created one in the parallel sandbox site for testing purposes:

https://sandbox.zenodo.org/collection/user-tracker-commons

MichaelCurrie commented 8 years ago

I made a pull request to the zenodio package to address the missing upload function.

MichaelCurrie commented 8 years ago

I'll include this here in case it's relevant: Q&A with André Brown from 19 February 2016 with Chris Linzy:

What is the directory structure for the videos as they currently exist on the drive?

The directory structure is quite chaotic since the data were originally recorded onto 8 machines by several different people. Basically, there are directories for each machine, but some are called things like ‘copied from pc207-5’. Within these folders they are arranged by user and then by date and time. A possible target structure is the one that we used for the feature files on the ftp server: ftp://ftp.mrc-lmb.cam.ac.uk/pub/tjucikas/wormdatabase/results-12-06-08/Laura%20Grundy/

**What will be the structure in terms of DOI/Deposition and the files? That is, which of the following structures will be used for Zenodo:

All videos as files under a single deposition/DOI
Each video as a file under its own deposition/DOI
All videos for a given worm type as files under a deposition/DOI for that worm type
Match the drive directory structure at some level
Other**

The answer will depend on what’s possible in terms of accessing the data once it’s there. It’s also possible that the people at Zenodo will have a preference we should take into account.

Does each deposition get uploaded as a zip file? Avelino mentioned something like that. In that case, we will probably want each video to be a separate deposition if we want to link back to videos stored at Zenodo from the database. On the other hand, if Zenodo will simply be a store that we access occasionally and the database essentially runs elsewhere, perhaps a strain per deposition would be appropriate. Finally, if it’s possible to match a directory structure and access subdirectories in an easy way, then matching the ftp server directory structure could make sense.

Will the files be keeping their original names or will some other naming scheme be used? If the later, then how will this be determined from the existing file names and/or location in the directory structure?

All of the filenames should be unique, so it might be easiest to just keep the names as they are. The names define things like mutant type and whether the worms are on the left or right side, but there are mistakes. For the previous set, we corrected these manually and put the information in a database, but left the names unchanged. We have more data now, but my plan was to do the same again. I suppose if people are going to be downloading these files it would be good not to have the misleading information in the filename. My vote would be to rename files based on the information in the database.

MichaelCurrie commented 8 years ago

Once #191 is complete, we'll be able to upload to Zenodo. Each entry to Zenodo will be made to our community, with:

DOI #x Video. AVI Skeleton WCON Feature HDF5?

We've agreed that we should not upload all files as one single DOI, since that would present a barrier to researchers downloading only those worms they want to study.

We can test on the sandbox at first.

ver228 commented 8 years ago

Hey @MichaelCurrie we are not planning to upload the .AVI video. It would be two .HDFF files ( video + skeletons) that are requiered for my viewer. We can upload a subsampled avi version as preview.

DOI #x Video HDF5 Skeleton HDF5 Feature HDF5 Skeleton WCON Video_Preview AVI

I could include the Video HDF5 and the Skeleton HDF5 in the same file, but we agree before with @aexbrown that having them in separate files is more compatible with the multiworm tracker.

MichaelCurrie commented 7 years ago

Closing this as it's now a duplicate of https://github.com/openworm/movement_cloud/issues/92

openworm / open-worm-analysis-toolbox

Upload files to Zenodo tagged with DOI #192