rfcx / arbimon-legacy

https://arbimon.org
Apache License 2.0
0 stars 0 forks source link

Cannot export PM results >1k jobs #1538

Closed koonchaya closed 5 months ago

koonchaya commented 6 months ago

Original post: https://rfcx.slack.com/archives/C03FD1WD02J/p1715001354394219 @carlybatist

Project - https://arbimon.org/project/biosoundscape/analysis/patternmatching

We're experiencing some issues with exporting the PM results for all jobs using the 'Export all results (.csv)' function in Arbimon for the project BioSoundSCapes. This is a critical feature given the volume of PM jobs to export (>1k). The issue that we've run into is in the 'Arbimon export' email from no-reply@arbimon.org, specifically there is no link provided to download the zip file. The 'download' button is not hyperlinked.

@carlybatist could download a much smaller bulk-export job from a separate project ok, so my hunch is that the issue has to do with how many PMs there are in the project (>1k)

koonchaya commented 6 months ago

I tried export all PM from this project. I am able to download the zip file. There are >1k job results in this project but in my export folder contains 333 job results.

occupancy-export (1).zip

Image

rassokhina-e commented 5 months ago

the issue occurs for the job where the PM rois more then 90k results, the main issue related to the time for the query respons. So, I just need to improve the query to get the data for a huge PM job.

rassokhina-e commented 5 months ago

the report is in a progress

koonchaya commented 5 months ago

Export files in https://drive.google.com/drive/folders/1wUXEQeTlEheaNBtJ2ISNXIagu3n2j17M?usp=drive_link

koonchaya commented 5 months ago

Suggested changes to PM bulk export:

rassokhina-e commented 5 months ago

Image Image Image

koonchaya commented 5 months ago

@rassokhina-e

Image

pattern-matching-export (2).zip

Image

rassokhina-e commented 5 months ago

Please recheck on staging

koonchaya commented 5 months ago

Image

Image

Image

rassokhina-e commented 5 months ago

We decreased the PM files, but still have a problem with exporting big PM jobs. The zip folder with all CSV files for the BioSoundSCape project is 2.5 Gb, there is a problem with putting this huge file to aws s3. So we need additional time to fix this problem and implement multipart upload to s3 to split this huge zip to the chunks

Image

Image

Image

koonchaya commented 5 months ago

@rassokhina-e Can we export them into separate files like we do on project backup export, add 200 rows of data in a file? I am not sure it this would help.

carlybatist commented 5 months ago

@koonchaya so just to clarify this was one of the fixes in the new arbimon update right? just saw that it was still open so wanted to check :) thanks!

koonchaya commented 5 months ago

Zhenya just released the another fix yesterday and I was waiting to for the export result first. It seems to work now.