Closed EmilJunker closed 2 years ago
I've done some testing and it turned out that renaming investigationSize
and datasetSize
to size
as proposed in https://github.com/icatproject/icat.server/pull/233#issuecomment-896870524 caused problems with Oracle because SIZE
is apparently a registered keyword in PL/SQL. That is why I have now renamed the size
attribute to fileSize
for both Investigations and Datasets. This works well and is also in line with the fileCount
attribute name.
I've done some testing and it turned out that renaming
investigationSize
anddatasetSize
tosize
as proposed in #233 (comment) caused problems with Oracle becauseSIZE
is apparently a registered keyword in PL/SQL. That is why I have now renamed thesize
attribute tofileSize
for both Investigations and Datasets. This works well and is also in line with thefileCount
attribute name.
Note: as an alternative to renaming the attribute, one could also override the column name in the database using the Column
annotation. E.g. something like:
public class Investigation extends EntityBaseBean implements Serializable {
@Comment("The cumulated sizes of the datasets within this investigation")
@Column(name = "SIZE_")
private Long size;
@Comment("The total number of datafiles within this investigation")
private Long fileCount;
}
I'd be happy with either solution. To be discussed in the collaboration meeting.
I get an error from the upgrade_mysql_5_0.sql
script trying to upgrade from icat.server
4.11.1:
$ mysql icat < upgrade_mysql_5_0.sql
ERROR 1025 (HY000) at line 117: Error on rename of './icat/#sql-76_54' to './icat/AFFILIATION' (errno: 150 "Foreign key constraint is incorrectly formed")
I get an error from the
upgrade_mysql_5_0.sql
script trying to upgrade fromicat.server
4.11.1:
There is an apparently related issue when deploying icat.server
during a fresh install. I get the following suspicious entry in the server.log
:
[2021-11-18T11:34:04.093+0100] [Payara 4.1] [WARNING] [] [javax.org.glassfish.persistence.org.glassfish.persistence.common] [tid: _ThreadID=32 _ThreadName=admin-thread-pool::admin-listener(1)] [timeMillis: 1637231644093] [levelValue: 900] [[
PER01000: Got SQLException executing statement "ALTER TABLE AFFILIATION ADD CONSTRAINT FK_AFFILIATION_DATAPUBLICATIONUSER_ID FOREIGN KEY (DATAPUBLICATIONUSER_ID) REFERENCES DATAPUBLICATIONUSER (ID)": java.sql.SQLException: Can't create table `icat`.`AFFILIATION` (errno: 150 "Foreign key constraint is incorrectly formed")]]
Apart from that log entry, the installation seem to succeed. I believe that icat.server
simply ignores SQL errors during install. But if one inspects the SQL dump after installation, it turns out that the foreign key constraint is indeed missing from the AFFILIATION
table:
CREATE TABLE `AFFILIATION` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`CREATE_ID` varchar(255) NOT NULL,
`CREATE_TIME` datetime NOT NULL,
`MOD_ID` varchar(255) NOT NULL,
`MOD_TIME` datetime NOT NULL,
`NAME` varchar(1023) NOT NULL,
`PID` varchar(255) DEFAULT NULL,
`DATAPUBLICATIONUSER_ID` bigint(20) NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `UNQ_AFFILIATION_0` (`DATAPUBLICATIONUSER_ID`,`NAME`) USING HASH
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
I get an error from the
upgrade_mysql_5_0.sql
script trying to upgrade fromicat.server
4.11.1:
Ok, I found it.
I got more details about that foreign key error running SHOW ENGINE INNODB STATUS
in the database:
2021-11-18 11:21:13 0x7f38402a8700 Error in foreign key constraint of table icat/AFFILIATION:
there is no index in the table which would contain
the columns as the first columns, or the data types in the
table do not match the ones in the referenced table
or one of the ON ... SET NULL columns is declared NOT NULL. Constraint:
,
CONSTRAINT `FK_AFFILIATION_DATAPUBLICATIONUSER_ID` FOREIGN KEY (`DATAPUBLICATIONUSER_ID`) REFERENCES `DATAPUBLICATIONUSER` (`ID`)
Actually, there is an index in the table which contains DATAPUBLICATIONUSER_ID
as the first column, but it is declared USING HASH
, see above. Apparently such an index is unusable for a foreign key, which is kind of logic. It seems that the reason for the USING HASH
declaration was that the name
attribute having 1023 characters is too wide to be used in an index directly. In any case, if one reduces the name attribute to 511 characters (which I guess, will still be enough), everything works.
This PR adds or modifies the following entity classes:
Affiliation
The home institute or other affiliation of a user in the context of a data publication
Uniqueness constraint: user
, name
Relationships:
Card | Class | Field |
---|---|---|
1,1 | DataPublicationUser | user |
Other fields:
Field | Type | Description |
---|---|---|
name | String [255] NOT NULL | An internal name for that affiliation entry, possibly the organization name |
pid | String [255] | Identifier such as ROR or ISNI |
fullReference | String [1023] | The full reference of the affiliation, optionally including street address and department, as it should appear in the publication |
DataCollectionInvestigation
Represents a many-to-many relationship between a DataCollection and its Investigations.
Uniqueness constraint: dataCollection
, investigation
Relationships:
Card | Class | Field |
---|---|---|
1,1 | DataCollection | dataCollection |
1,1 | Investigation | investigation |
DataPublication
A curated data publication
Uniqueness constraint: facility
, pid
Relationships:
Card | Class | Field |
---|---|---|
1,1 | Facility | facility |
1,1 | DataCollection | content |
0,1 | DataPublicationType | type |
0,* | DataPublicationDate | dates |
0,* | DataPublicationFunding | fundingReferences |
0,* | DataPublicationUser | users |
0,* | RelatedIdentifier | relatedIdentifiers |
Other fields:
Field | Type | Description |
---|---|---|
pid | String [255] NOT NULL | Persistent Identifier of the publication, such as a DOI |
title | String [255] NOT NULL | Title of the publication |
publicationDate | Date | Date when the data was made publicly available |
description | String [4000] | Abstract |
subject | String [1023] | List of keywords |
DataPublicationDate
A date relevant for the publication
Uniqueness constraint: publication
, dateType
Relationships:
Card | Class | Field |
---|---|---|
1,1 | DataPublication | publication |
Other fields:
Field | Type | Description |
---|---|---|
dateType | String [255] NOT NULL | Type of the date, see DataCite property dateType for suggested values |
date | String [255] NOT NULL | Use ISO 8601 format, may also be a range |
DataPublicationFunding
Represents a many-to-many relationship between a data publication and a funding reference
Uniqueness constraint: dataPublication
, funding
Relationships:
Card | Class | Field |
---|---|---|
1,1 | FundingReference | funding |
1,1 | DataPublication | dataPublication |
DataPublicationType
A type of data publication, for example, whole investigation, user-selected datasets/files. This is likely to be for facility internal purposes following their own classification scheme and allowing, for example, the front end to display them in different ways.
Uniqueness constraint: facility
, name
Relationships:
Card | Class | Field |
---|---|---|
1,1 | Facility | facility |
0,* | DataPublication | dataPublications |
Other fields:
Field | Type | Description |
---|---|---|
name | String [255] NOT NULL | A short name identifying this data publication type within the facility |
description | String [255] | A description of this data publicaion type |
DataPublicationUser
Author, e.g. creator of a or contributor to a data publication
Uniqueness constraint: publication
, user
, contributorType
Relationships:
Card | Class | Field |
---|---|---|
1,1 | DataPublication | publication |
1,1 | User | user |
0,* | Affiliation | affiliations |
Other fields:
Field | Type | Description |
---|---|---|
contributorType | String [255] NOT NULL | Role of that user in the publication, see DataCite property contributorType for suggested values or use "Creator" |
orderKey | String [255] | Defines an order among the contributors |
givenName | String [255] | The given name of the user |
familyName | String [255] | The family name of the user |
fullName | String [255] | May include title |
DatasetInstrument
Represents a many-to-many relationship between an instrument and a dataset with data collected at that instrument
Uniqueness constraint: dataset
, instrument
Relationships:
Card | Class | Field |
---|---|---|
1,1 | Dataset | dataset |
1,1 | Instrument | instrument |
DatasetTechnique
Represents a many-to-many relationship between a dataset and the experimental technique being used to create that Dataset
Uniqueness constraint: dataset
, technique
Relationships:
Card | Class | Field |
---|---|---|
1,1 | Dataset | dataset |
1,1 | Technique | technique |
FundingReference
Information about financial support
Uniqueness constraint: funderName
, awardNumber
Relationships:
Card | Class | Field |
---|---|---|
0,* | DataPublicationFunding | publications |
0,* | InvestigationFunding | investigations |
Other fields:
Field | Type | Description |
---|---|---|
funderName | String [255] NOT NULL | Name of the funding entity |
funderIdentifier | String [255] | Unique identifier of the funding entity, such as a Crossref Funder ID |
awardNumber | String [255] NOT NULL | Code assigned by the funder to identify the grant, suggest to use ":unas" if there is no such identifier |
awardTitle | String [255] | Title or name of the grant |
InvestigationFacilityCycle
Many to many relationship between investigation and facilityCycle. Allows investigations to belong to multiple cycles at once.
Uniqueness constraint: facilityCycle
, investigation
Relationships:
Card | Class | Field |
---|---|---|
1,1 | Investigation | investigation |
1,1 | facilityCycle | facilityCycle |
InvestigationFunding
Represents a many-to-many relationship between an investigation and a funding reference
Uniqueness constraint: investigation
, funding
Relationships:
Card | Class | Field |
---|---|---|
1,1 | Investigation | investigation |
1,1 | FundingReference | funding |
RelatedIdentifier
Identifier of a related resource to a data publication
Uniqueness constraint: publication
, identifier
Relationships:
Card | Class | Field |
---|---|---|
1,1 | DataPublication | publication |
Other fields:
Field | Type | Description |
---|---|---|
identifier | String [255] NOT NULL | The identifier of the related resource |
relationType | String [255] NOT NULL | Description of the relationship with the related resource, see DataCite property relationType for suggested values |
fullReference | String [1023] | The full reference for the related resource as it should be displayed on the landing page |
Technique
Represents an experimental technique
Uniqueness constraint: name
Relationships:
Card | Class | Field |
---|---|---|
0,* | DatasetTechnique | datasetTechniques |
Other fields:
Field | Type | Description |
---|---|---|
name | String [255] NOT NULL | A short name for the technique |
pid | String [255] | A persistent identifier attributed to the technique, ideally referring to a vocabulary term |
description | String [255] | An informal description for the technique |
Not mentioned here: entity classes that are the target of a many-to-one relation from one of the new classes get the corresponding reverse one-to-many relation added.
Dataset
New fields:
Field | Type | Description |
---|---|---|
fileCount | Long | The total number of datafiles within this dataset |
fileSize | Long | The cumulated sizes of the data files within this dataset |
Investigation
New fields:
Field | Type | Description |
---|---|---|
fileCount | Long | The total number of datafiles within this investigation |
fileSize | Long | The cumulated sizes of the datasets within this investigation |
Edit: update Affiliation
class as a result of the merge of #260
Edit: update after the merge of #271, #283, and #263
I mentioned in the meeting last Thursday that there might be an issue with new entity classes and the export and import calls in icat.server
. I checked this and was proven wrong: icat.server
turns out to be better than I thought, it just works, even for the new classes.
There is however one issue with the fileCount
and fileSize
attributes in Dataset
and Investigation
. If the database triggers are in places, the values will be wrong after an export / import cycle.
Thanks @RKrahl. That's good news about the export and import just working for the new classes.
It's a shame about the fileCount
and fileSize
fields. Can this be fixed or would it be too much work to be worth it?
My feeling is that if it can't be fixed easily then we have to live with that and probably just add a warning to any documentation that (hopefully!) exists on the export/import functionality. The workaround after doing an import would presumably be to either re-calculate all the values using the INITIALIZE_DS/INV_SIZE_COUNT
procedures in the ICAT 5 migration scripts (assuming you have a fairly small ICAT), or to edit the procedure to only update the Investigations that you have modified if you have a large ICAT and it's too expensive to re-calculate all values in the whole ICAT.
It's a shame about the
fileCount
andfileSize
fields. Can this be fixed or would it be too much work to be worth it?
I'd say, it can't easily be fixed. It is due to the design decisions we made for the fileCount
/ fileSize
feature: we decided to implement the feature with triggers in the database backend and to make installing the triggers optional. That means, icat.server
has no way to know whether the triggers are in place. The triggers do not recalculate the values but do only incremental updates when a Datafile
is created, deleted or updated. Recalculating would have been too expensive performance-wise. And we decided not to enforce correct values on create or update of Database
and Investigation
objects. So it's always possible to create a Dataset
or an Investigation
with "wrong" values.
Now, what happens is: the export writes Investigations
and Datasets
with their (presumably correct) attribute values to the export file. The import reads that file and creates Investigations
and Datasets
with the (correct) values they have set in the file. Then the import reads the Datafiles
from the file and creates them. This causes the database triggers to update fileCount
and fileSize
in the corresponding Dataset
and Investigation
objects. bummer: as result, the values are doubled.
My feeling is that if it can't be fixed easily then we have to live with that and probably just add a warning to any documentation that (hopefully!) exists on the export/import functionality.
Yes.
The workaround after doing an import would presumably be to either re-calculate all the values
That would be my advise. As an alternative, one could temporarily disable the triggers during the import.
[…] if you have a large ICAT and it's too expensive to re-calculate all values in the whole ICAT.
I don't believe import or export are ever used on a large production ICAT.
I now merged #260 into this branch and updated the comment describing the new classes above accordingly.
I now merged the other pending PRs requesting schema changes into this one. From my point of view, it's ready. I did not test this with Oracle, though. @kevinphippsstfc, could you please check again.
Thanks @RKrahl. I will test this branch on Oracle.
I've done a mvn install
with the the test setup using an Oracle database and the tests all pass apart from:
TestRS.testSearch:691->search:1034 expected:<4> but was:<5>
which is the test that PR #275 aims to fix.
Did you manage to take a look at that yet?
I've done a mvn install
with the the test setup using an Oracle database and the tests all pass apart from:
TestRS.testSearch:691->search:1034 expected:<4> but was:<5>
which is the test that PR #275 aims to fix.
Did you manage to take a look at that yet?
I've done a
mvn install
with the the test setup using an Oracle database and the tests all pass apart from:TestRS.testSearch:691->search:1034 expected:<4> but was:<5>
which is the test that PR #275 aims to fix. Did you manage to take a look at that yet?
Not yet, that will be my next action tomorrow.
This PR collects (i.e. obsoletes) several existing PRs and provides a clean, conflict free way to merge them into master.
Obsoletes PR #232, closes #200 closes #249, includes parts of #248. Obsoletes PR #233, closes #211, closes #238. Obsoletes PR #234, closes #230. Obsoletes PR #235, closes #228.