icatproject / icat.server

The ICAT server offering both SOAP and "RESTlike" interfaces to a metadata catalog.
Other
1 stars 5 forks source link

Populating of filesize and filecount in ICAT 5 #281

Closed kevinphippsstfc closed 2 years ago

kevinphippsstfc commented 2 years ago

I have been testing the new procedures for upgrading to ICAT 5 INITIALIZE_DS_SIZE_COUNT and INITIALIZE_INV_SIZE_COUNT (see: https://github.com/icatproject/icat.server/pull/256/files#diff-902097007085fc80278867aaad429807b4ea831eef1257601dff06e44d0e1080R132) on a copy of the Diamond (Oracle) ICAT database and found that when a Dataset or Investigation has no Datafiles, the FILECOUNT column is populated (with a zero) but the FILESIZE is not.

This is due to the fact that Dataset FILESIZE is populated using SUM which seems to return null if there are no matching records, whereas FILECOUNT is populated by COUNT which returns zero if there are no matching records. (I'm not sure if this behaviour is the same for mysql).

I tested a small modification to the create procedures for Oracle which seems to work:

In INITIALIZE_DS_SIZE_COUNT the update statement for FILESIZE should be: UPDATE DATASET SET FILESIZE = NVL(FILE_SIZE, 0) WHERE ID = CUR_DATASET.ID; In INITIALIZE_INV_SIZE_COUNT the update statement for FILESIZE should be: UPDATE INVESTIGATION SET FILESIZE = NVL(FILE_SIZE, 0) WHERE ID = CUR_INVESTIGATION.ID; both using NVL to set the value to zero where it would otherwise be null.

We would need to check and possibly do the same for the mysql script and test it if people agree that this is a better way for it to work.

So this ensures that all FILESIZE and FILECOUNT values are populated in the database, if you choose to use the new trigger based functionality.

But then I realised that when new Investigations or Datasets are added, the size and count values will also both remain set to null until any Datafiles are added to those Datasets.

So I was thinking we could specify a default value of zero via the JPA annotations but according to this item on StackOverflow that is possibly not straightforward: https://stackoverflow.com/questions/197045/setting-default-values-for-columns-in-jpa The accepted answer is to use the columnDefinition parameter of @Column but that seems to be database specific so we can't use that. The highest scoring answer is to set a default value on the field when it is initialised eg: private Long fileSize = 0L;

but I'm not aware that we do this anywhere else in icat.server or if there are any downsides of doing this.

Does anyone have any comments or advice?

kevinphippsstfc commented 2 years ago

After further testing I found that INITIALIZE_DS_SIZE_COUNT and INITIALIZE_INV_SIZE_COUNT behaved the same on mysql (mariadb) and oracle, and that there were cases where the filecount was left set to null on empty Investigations/Datasets when the procedures were run (in addition to the filesize null values already found).

I also made the executive decision that it would be preferable for the filesize and filecount values set to zero when a new entity is created rather than end up with a mix of null and zero sized entities in an ICAT.

agbeltran commented 2 years ago

included in 5.0.0