ices-eg / DIG

ICES Data and Information Group
Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

data portal #666

Closed neil-ices-dk closed 2 weeks ago

neil-ices-dk commented 4 weeks ago

Hi all,

I'm taking advantage of this board/group although this does fall outside vocab management per se but it seems an efficient place to have a discussion.

So we have two issue to discuss that are important to our service: 1) active monitoring of data portal (who, what) 2) critical bugs and changes i.e. in light of cyber attack etc. that need a continuous (but low level) effort

Thanks, Neil


Hi Neil,

I noticed that the ICES data portal had many requests waiting (under process) at https://data.ices.dk/download-management and which had been pending for quite a long time i.e. 14 days.

the question is who is monitoring the data portal.

One of the issues in the data portal is that the same IP can download the same data multiple times at the same time. In principle the same user could make a billion request for all data within all datasets exposed in the data portal at the same time, without any added benefit for the user (or robot 😉) but with a tremendous burden for our servers for no reason. Another issue is that if a request for one dataset fails for some reason, then all other datasets will be requested again as well, the next time the failed dataset is requested - typically the next day. This also and likewise put an extra burden on our servers, for no added benefit.

I have requested both issues described above but have been told that the development on the data portal is on hold

https://dev.azure.com/ices-devops/DataCentre/_boards/board/t/Data%20Portal/Backlog%20items

https://dev.azure.com/ices-devops/DataCentre/_boards/board/t/Data%20Portal/Backlog%20items?workitem=3952 https://dev.azure.com/ices-devops/DataCentre/_boards/board/t/Data%20Portal/Backlog%20items?workitem=4457

Please see above as constructive feedback!

Best, Hjalte

mehdiabbasi commented 4 weeks ago

Hi Neil @neil-ices-dk ,

Monitoring DataPortal is my responsibility and I am also responsible for fixing the bugs but new features or changing features should wait for the development phase which will start in September. The download failure happened because of a timeout error from Acoustic API that is already reported to @HjalteParner and @ihtsham two days ago and discussed the details, they need to fix their API to be responsive as other Datasets have not reported any issue until now. The backlogs that Hjalte put the links below, also agreed to be put on priority in the last sprint and we decided to discuss it on the first sprint after summer which will be in September. As I mentioned on the last DC, all our APIs are at risk of calling a billion requests that should be addressed. No limitation on downloads is the policy that was agreed by the DataPortal team in the first phase of development of DataPortal but we agreed to discuss it on the next sprint and make a new policy. although @HjalteParner called me a "not very cooperative person" which I considered as not constructive feedback from a colleague, I try to do my best to support DataPortal and will be happy to discuss both issues that you mentioned in the next sprint.

Best, Mehdi

neil-ices-dk commented 3 weeks ago

Thanks @mehdiabbasi i would suggest you come along to the next RMG meeting for 10 minutes so we can discuss if there is anything that the data officers can support you with in the monitoring.

I agree that the unlimited downloads needs to be reigned in and reevaluated in the dataportal team

HjalteParner commented 3 weeks ago

Thanks @mehdiabbasi, I can see that the data portal now have addressed all pending queries successfully, and I assumed you have fixed the issue/bug pointed out within the data portal?

Thanks @neil-ices-dk, for bringing the attention towards the data managers for monitoring possible issues within the data portal query queue in order for us to be able to handle data request in a timely manner looking ahead.

And finally thanks for indicating the willingness to bring up the long standing pending performance improvement task pointed out in the backlog asap, which certainly will help the requesters, the load towards associated data systems behind hosted datasets, and also the economic, if the data portal at some point gets to be hosted in the cloud.

mehdiabbasi commented 3 weeks ago

The issue from Acoustic API still is there and causes the failure on some downloads image image image

HjalteParner commented 3 weeks ago

Thanks for testing @mehdiabbasi.

It indeed seems like we have an issue as well on the acoustic side of things - even though having two data requests within 7 seconds for 16 datasets for all data between 2021 and 2024, is quite unrealistic for a user with good intension i.e. not a hacker ;-)

We (@ihtsham) have now optimised the file compression routine in test, which might very well fix the issue.

So please test again. We are very happy with you making an effort to test our systems.

However this time, please do your tests using our test setup, and not our production setup!

We currently have real users using the production system for validation, uploads and downloads, and we do not want to break their work. FYI currently our expert group IESSNS is having their annual post cruise meeting, and are using the system to make their annual abundance indices estimates for this survey.

The address for our acoustic test setup is acoustictest.ices.dk instead of acoustic.ices.dk and for oceanography - not surprisingly - oceantest.ices.dk instead of ocean.ices.dk

Thanks for your understanding and not at least your help in improving our system in system interaction.

odontaster commented 3 weeks ago

Downloads from Data Portal can be followed from here: https://data.ices.dk/download-management If these are "Under process" for more than 3 days, then Mehdi needs to rerun.

Data managers, make rule to put "Download failed" emails to end in another folder @odontaster, maybe main accessions mailbox, not Data Portal folder. Check folder from time to time for user requests.

mehdiabbasi commented 3 weeks ago

Hi @ihtsham,

Thank you for fixing the Acoustic API issue when downloading data. I tested on the acoustictest.ices.dk and it is working. If you push it to production, I will test it again also on production. Even though we may agree to limit one user/IP to download the same data, there is the possibility of downloading the same data from different users as I can already see many cases in the download list. Thank you for fixing the issue.

HjalteParner commented 3 weeks ago

@mehdiabbasi please respect using our testing environment for testing as previously requested! I cannot stress this enough - for the reason already given. Thank you.

ihtsham commented 3 weeks ago

@HjalteParner We tested everything in the testing environment, and it was working fine, so we moved the fix to production. However, I noticed the download failed in production this morning. @mehdiabbasi and I are already working on it, and we’re focused on getting it resolved. We’ll keep you updated as we make progress. Thanks for your understanding.

HjalteParner commented 3 weeks ago

@ihtsham as requested, I have now restored the production database on test so the data within the databases are exactly the same. The amount of downloads on the two environments are of course not the same, but otherwise the two environment should be exactly the same - for you to make comparable tests. Let me know if I can help in any other way.

mehdiabbasi commented 2 weeks ago

Thank you @ihtsham for fixing the issue in Acoustic API, we have tested it on both the test and production servers. It is working and the issue is resolved