wettenhj commented 9 years ago

Background

The initially targeted users of MyData (https://github.com/wettenhj/mydata) have requested that their users shouldn't have to interact with MyData at all if they don't want to, i.e. MyData will be primarily used by facility managers for adding new instrument PCs to MyTardis and for diagnosing failed uploads. So general microscope users should be able to simply save a folder (e.g. "Dataset 1") in their user folder (e.g. "jsmith") and leave it up to MyData to put the dataset in a sensible default experiment in MyTardis (which the user can later modify if they wish). The proposed method for defining a default experiment is to group datasets by: (i) instrument, (ii) user who collected the data (the researcher) (iii) the date on which the data was collected.

So for example, if MyData found a "Dataset 1" folder with a creation date of "2014-10-11" within a "jsmith" folder on instrument "Test Microscope 1", then it would query MyTardis to see if a default experiment record already exists which is suitable for this dataset, i.e. an experiment record tagged with "Test Micrsocope 1", "jsmith" and "2014-10-11". If it didn't already exist, MyData would create this default experiment record. It would initially create the record using a facility role account, e.g. MyTardis username="myfacility", and then user "jsmith" would be given full ownership access to the experiment record by creating an appropriate ObjectACL record.

The question is how to implement these experiment "tags" (instrument, data-collector and date-of-collection) nicely in MyTardis.

Option 1. (already implemented in MyData's current MyTardis test instance)

Create a schema and parameters (as shown here: https://github.com/wettenhj/mydata/raw/master/UserGuideImages/Experiment%20Schema%20and%20Parameter%20Names.PNG)
Add some functionality to the MyTardis API to allow easy filtering of experiments, based on the values of these parameters: https://github.com/wettenhj/mytardis/blob/mydata/tardis/tardis_portal/api.py#L693

Option 2.

Make use of MyTardis's new Instrument model (accessible as an optional field in the dataset model), but try to avoid introducing any new schemas, parameters or changes to the experiment model.
This doesn't look feasible, because for "default experiments", we really want the instrument to be a property of the experiment, not the dataset. And we still need to find a way to record the date of data collection (NOT the same as the date of creation of a database record), There is already functionality in MyTardis's ObjectACLs which could be used to tag an experiment with the researcher who collected the data, but it may not be easy to filter experiments in the TastyPie API using ObjectACLs when determining whether a default experiment already exists for a given instrument, data owner, and date of collection.

Option 3.

Add new fields to MyTardis's Experiment model to allow "default experiments" of this form to be defined and queried easily.
- Having an instrument field in both the Experiment and Dataset models might go against database normalization principles, but it could certainly be useful here, and there would be no problem with just setting it to NULL for Experiments containing Datasets from multiple instruments.
Adding a data-collection-date field to the Experiment model would be easy, but it would be good to bounce the idea of other MyTardis users and see if it would cause confusion with the creation date of the database record, and whether some users would argue that date of collection should go in the Dataset model instead of the Experiment model (which certainly wouldn't help with the objective here of defining "default experiments").
Adding a field to the Experiment model for the user who collected the data would be easy, but there could be confusion with the ObjectACL records which indicate who currently has access to the data. For now, I would prefer having a new field in the Experiment model for this (and documenting the new fields together as a way of grouping datasets collected by the same user on the same instrument on the same date). But we could use ObjectACLs if we can work out an appropriate to filter by ObjectACL when querying experiment records in the TastyPie API.

grischa commented 9 years ago

Thanks for giving us the background information in such detail. After considering all the requirements you listed, I came up with another option shown below. I hope I understood it well enough to address all the requirements.

Option 4.

Use the existing fields in the experiment model start_time and end_time https://github.com/mytardis/mytardis/blob/develop/tardis/tardis_portal/models/experiment.py#L55
Use the owner field to set the user who collected the data, add an ACL for that user as well, of course.
Query for a "default experiment" by title__startswith, owner, start_time and end_time.
If you need to do complex queries with the API, why not create a custom API hook/url for these queries. It is a lot more lightweight than a database migration and more maintainable.

wettenhj commented 9 years ago

Hi Grischa,

Your Option 4 sounds good!

Except I can't see an "owner" field in the Experiment model. So I guess I should add functionality to the ExperimentResource in api.py, so that you can filter by "owner", which then queries the ObjectACL model.

Does that make sense?

Cheers, James

On 21 Nov 2014, at 3:28 pm, Grischa Meyer notifications@github.com wrote:

Thanks for giving us the background information in such detail. After considering all the requirements you listed, I came up with another option shown below. I hope I understood it well enough to address all the requirements.

Option 4.

Use the existing fields in the experiment model start_time and end_time https://github.com/mytardis/mytardis/blob/develop/tardis/tardis_portal/models/experiment.py#L55

Use the owner field to set the user who collected the data, add an ACL for that user as well, of course.

Query for a "default experiment" by title__startswith, owner, start_time and end_time.

If you need to do complex queries with the API, why not create a custom API hook/url for these queries. It is a lot more lightweight than a database migration and more maintainable.

— Reply to this email directly or view it on GitHub.

wettenhj commented 9 years ago

The use of the custom schema / parameters has been removed, and MyData now uses "Option 4" instead. See the following commits:

It appears to work, but further testing is needed...

wettenhj commented 9 years ago

One flaw with using titlestartswith (Option 4) is that our clients were hoping that users could rename these default experiments in MyTardis from their default titles of "Instrument name Date" to anything they want. After users have renamed these experiments, using the titlestartswith method of identifying a default experiment from MyData won't work, so MyData will create a duplicate experiment the next time it scans the datasets from that instrument and creation date. So do we consider adding an instrument foreign key to the experiment model (which can be NULL for experiments spanning multiple instruments)?

steveandroulakis commented 9 years ago

Note to self: Talk about Steve's "option 5" when we meet. ;)

wettenhj / mytardis

Default experiments (to be used by MyData) - please discuss MyTardis model changes required #5

Background