obiba / mica2

Mica is a web portal for epidemiological study consortia.
http://www.obiba.org/pages/products/mica/
GNU General Public License v3.0
10 stars 15 forks source link

"The configured limit of 1,000 object references was reached" after importing large variable XLSX #4458

Open vera opened 2 months ago

vera commented 2 months ago

This issue is unique

Version information

5.2.3

Expected behavior

When uploading a large variable XLSX ("Variables" sheet has 18,000+ rows, "Categories" sheet has 66,000+ rows) for a collected dataset and then publishing the dataset, we expect all variables and if applicable their categorical values to be shown in MICA.

Actual behavior

When publishing the dataset, a "504 Gateway timeout" error message appears in the admin UI and the following warning appears in the MICA logs:

2024-08-28 11:37:19.044 WARN 59 --- [87606386-132162] n.s.e.pool.sizeof.ObjectGraphWalker : The configured limit of 1,000 object references was reached while attempting to calculate the size of the object graph. Severe performance degradation could occur if the sizing operation continues. This can be avoided by setting the CacheManger or Cache <sizeOfPolicy> elements maxDepthExceededBehavior to "abort" or adding stop points with @IgnoreSizeOf annotations. If performance degradation is NOT an issue at the configured limit, raise the limit value using the CacheManager or Cache <sizeOfPolicy> elements maxDepth attribute. For more information, see the Ehcache configuration documentation.

Reproduction steps

  1. Log in to MICA
  2. Open a collected dataset
  3. Upload a file in MICA/OPAL format with a lot of variables (in our case, 18,000+ Variables and 66,000+ Categories rows)
  4. Edit the "Study Table" section: under Data source > Path, enter the path to the file
  5. "Publish"

Operating System (OS)

No response

Browser

No response

Contact info

NFDI4Health

cc @johannes-darms

ymarcon commented 2 months ago

Can you share this xlsx file? (at least privately)

vera commented 2 months ago

Yes, can I send it to your email address (yannick.marcon@obiba.org)?

vera commented 2 months ago

Thank you, sent!

ymarcon commented 2 months ago

I was able to create the tables in my local opal. As mica and opal share the same excel reader library, this is not a problem with your file nor with the file reader.

Have you tried to increase the memory of mica? The JAVA_OPTS env variable, see https://micadoc.obiba.org/en/latest/admin/installation.html

vera commented 2 months ago

I have increased the memory setting, but it doesn't seem to totally fix the problem. Right now, I am not seeing the error message in the log anymore, but I am still seeing the 504 Gateway Timeout.

Specifically, the request to PUT /ws/draft/collected-dataset/3601/_publish?cascading=NONE fails with 504 after 30 seconds. Is there anything I could do to increase the request speed or the timeout duration?

vera commented 2 months ago

We were now able to fix the timeout (it was related to Jetty). We are now seeing the following behaviour: for a short time while the _publish request was still loading, we saw approx. 1,000 variables (out of 18,000) in the frontend (at /search#lists?type=variables&query=dataset(in(Mica_dataset.id,3601)),variable(limit(0,20))). After the publish was finished, it jumped back down to 89 variables shown. Do you have an idea what might cause this?

By the way, we are using the direct upload to MICA that was developed by you, without OPAL. Thanks!