sbgrid / data-capture-module

5 stars 4 forks source link

Error in request Dataverse API to report successful upload #38

Open scruz002 opened 4 years ago

scruz002 commented 4 years ago

Hi,

I think we're almost making DCM work. The steps to follow are being performed correctly:

But we are having a problem. After moving files to /hold. the post_upload.bash script makes a request to the Dataverse API to inform successful in receiving the files. However, the Dataverse is returning an error.

# source /opt/dcm/scn/post_upload.bash
post_upload starting at  Thu Sep 10 19:33:23 -03 2020
/deposit/WZGANH/WZGANH/files.sha  :  /deposit/WZGANH/WZGANH  :  WZGANH  :  WZGANH
checksums verified
data moved
ERROR: dataverse at https://xxxxxxxxxxx had a problem handling the DCM success API call
{"status":"ERROR","code":500,"message":"Internal server error. More details available at the server logs.","incidentId":"284d10d6-1101-47da-be9e-9bde74cf3828"}
will retry in 60 seconds
ERROR: retry failed, will need to handle manually
post_upload completed at  Thu Sep 10 19:34:59 -03 2020

In the Dataverse log

[2020-09-10T19:33:59.309-0300] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.api.errorhandlers.ServeletExceptionHandler] [tid: _ThreadID=52 _ThreadName=jk-connector(3)] [timeMillis: 1599777239309] [levelValue: 1000] [[
  API internal error 284d10d6-1101-47da-be9e-9bde74cf3828: Null Pointer
java.lang.NullPointerException
        at edu.harvard.iq.dataverse.api.Datasets.receiveChecksumValidationResults(Datasets.java:1351)

In the source code of Dataverse version 4.20 (edu.harvard.iq.dataverse.api.Datasets)


1346                 String storageDriver = dataset.getDataverseContext().getEffectiveStorageDriverId();
1347                 String uploadFolder = jsonFromDcm.getString("uploadFolder");
1348                 int totalSize = jsonFromDcm.getInt("totalSize");
1349                 String storageDriverType = System.getProperty("dataverse.file." + storageDriver + ".type");
1350 
1351                 if (storageDriverType.equals("file")) {
1352                     logger.log(Level.INFO, "File storage driver used for (dataset id={0})", dataset.getId());

Apparently the property "dataverse.file." + storageDriver + ".type" is not defined.

We would like to know how in configuration we can define the appropriate values for this property so that the request to Dataverse API can be handled correctly.

We noticed that the properties related to the upload folder start with "dataverse.files" and not "dataverse.file".

Thanks for any help.

pameyer commented 4 years ago

Hi @scruz002 - thanks for reporting this; it looks to me like this API may have broken in the multiple file store support for https://github.com/IQSS/dataverse/releases/tag/v4.20. Is it possible for you to check with Dataverse <= 4.19 to see if this error is still present?

qqmyers commented 4 years ago

This looks like a missed step in setting up v4.20 - in the v4.20 release upgrade instructions there's a section on updates related to multi-store support, which are required even if you are only using one store. I've copied those instructions below - the ones you need depends on whether you're using a file or S3 store. In either case there are new jvm options to be set in glassfish using the appropriate commands.

The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade: For a file store:

./asadmin create-jvm-options "\-Ddataverse.files.file.type=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.label=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.directory=<your directory>"

For a s3 store:

./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"
./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"
./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=<your_bucket_name>"
./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=<your_bucket_name>"
scruz002 commented 4 years ago

Hi @scruz002 - thanks for reporting this; it looks to me like this API may have broken in the multiple file store support for https://github.com/IQSS/dataverse/releases/tag/v4.20. Is it possible for you to check with Dataverse <= 4.19 to see if this error is still present?

Sorry but we have no way to downgrade Dataverse in our homologation environment. Do you know of anyone using DCM in version 4.20?

scruz002 commented 4 years ago

This looks like a missed step in setting up v4.20 - in the v4.20 release upgrade instructions there's a section on updates related to multi-store support, which are required even if you are only using one store. I've copied those instructions below - the ones you need depends on whether you're using a file or S3 store. In either case there are new jvm options to be set in glassfish using the appropriate commands.

The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade: For a file store:

./asadmin create-jvm-options "\-Ddataverse.files.file.type=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.label=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.directory=<your directory>"

For a s3 store:

./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"
./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"
./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=<your_bucket_name>"
./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=<your_bucket_name>"

We already tried this configuration and got the same error

qqmyers commented 4 years ago

Line 1346 is trying to get the storage driver id to use. That can be set by default or per Dataverse (if you have more than one store defined):

Is your -Ddataverse.files.storage-driver-id= option set to the id of the store you want to use? (In the examples above that would be 'file' or 's3' but it would have to match whatever storage-id you use.)