rfcx / arbimon

Ecoacoustic analysis platform empowering conservationists to analyze acoustic data and to derive insights about the ecosystem at scale
https://arbimon.org
Apache License 2.0
5 stars 1 forks source link

Provide backup feature for projects #1807

Closed carlybatist closed 6 months ago

carlybatist commented 8 months ago

Context

From @carlybatist and Ant/Marconi -

We should have a way to provide partners with a backup of all their data from a project (all raw recordings, results of jobs they've run, all metadata, all Insights maps/figures).

This feature will fulfill compliance with EU law (GDPR) - all users should (read: need to) have a way to download their data.

Expected results

Additional notes

Design

https://www.figma.com/file/GjR3UAHkQyGvx1iZzdZdDw/%F0%9F%92%BB-Arbimon-Platform?type=design&node-id=9914-6375&mode=design&t=ZfYWta8cBYYoo7ts-4

Copy:

You can request a comprehensive backup—including all raw recordings, job results, metadata, and Insights visuals—every 7 days. Once requested, your download link will be ready within 24 hours. Remember, download links expire after 7 days, so be sure to save your backup promptly.

Proposed technical implementation

  1. When the project admin/owner requests a backup, it adds a row to a new "backup" table containing project id, requested by (user), requested at (timestamp), expires at (timestamp), status (requested, processing, available), url, size (number of mbs).
  2. After the user requests a backup, they will not be able to request another backup for 30 days. (to limit overuse of this feature)
  3. Background job runs every night (middle of the night US time) that will pick up any requested backups and start processing them one at a time.
  4. Backup process:
    • change status to processing
    • export sites.csv, species.csv
    • export recordings.csv (generating a signed url for each uri)
    • export playlists.csv, playlist_recordings.csv
    • export templates.csv (generating a signed url for each uri)
    • export recording_validations.csv
    • export pattern_matchings.csv, pattern_matching_rois.csv (generating a signed url for each uri), (maybe pattern_matching_validations.csv?)
    • export soundscapes.csv (generating a signed url for each uri) _- export classifications.csv, model_training.csv (maybe trainingset.csv)? - export AED.csv, clustering.csv
    • zip all csv files, upload to S3 export folder
    • change status to available
    • send email to requesting user that the backup is available with the link in the email

Related APIs

Other related tasks

tiffanylee0125 commented 7 months ago

Prototype

Screenshot 2024-04-08 at 9 38 57 PM Screenshot 2024-04-08 at 9 37 39 PM Screenshot 2024-04-08 at 9 37 35 PM
tiffanylee0125 commented 7 months ago

Empty state Image

Display past requests Image

@antonyharfield is this what you have in mind?

tiffanylee0125 commented 7 months ago

Should people have the ability to cancel a backup and get another backup opportunity in the same 30 days as when the original backup was requested?

carlybatist commented 7 months ago

The easiest answer would be no?

antonyharfield commented 7 months ago

The part that's been started is the table export for sites and recordings, and adding signed urls for recordings. It is here: apps/cli/src/export/project-csv/index.ts.

naluinui commented 7 months ago

Dev updates

We have finished the user facing part, and now ready to test on https://dev.arbimon.org/

Project settings page (change design as per Figma)

https://github.com/rfcx/arbimon/assets/9149523/7b248f93-7f66-405f-a0e2-b38f697c7046

Project backup

https://github.com/rfcx/arbimon/assets/9149523/6d58e578-40c1-49bc-bef9-637925f07237

What's left

We're now working the cronjob part to actually do the export job, send email, and update status back to the UI.

FYI @koonchaya you can go ahead and get familiar with the changes on dev :)

koonchaya commented 7 months ago

Image

Image

Image

naluinui commented 7 months ago

Save button is on top of the page (in the design is in the bottom)

Which Figma file do you compare with Noon? Please see our discussion here. For the full perspective behind moving the button to the top.

naluinui commented 7 months ago

Test project section is missing the warning message

This will only show up when you have published insight page. (Same as existing flow)

Version WITH published insight WITHOUT published insight
Current Screenshot 2567-04-26 at 13 09 32 Screenshot 2567-04-26 at 13 11 04
New design Screenshot 2567-04-26 at 13 09 28 Screenshot 2567-04-26 at 13 12 51
naluinui commented 7 months ago

Reposition date picker delete button

This is due to current library we used for date picker. It'll be fixed as part of date picker task https://github.com/rfcx/arbimon/issues/1573 next sprint

koonchaya commented 7 months ago

Requested popup:

Design Current
Image Image
Design Current
Image Image
naluinui commented 7 months ago

Thanks @koonchaya, I just fixed the font size and icon padding as part of this task.

I'm not sure about the background color & close icon as those still valid in other places. I think we might want to keep them consistent throughout the platform and change them altogether. (there seems to be 8 places using the current style).

Please create another improvement task for this and we can tackle all for the 8 popups we have at once (we may need to do the same in arbimon-legacy too)

koonchaya commented 7 months ago

Do we allow only project owner to request backup?

koonchaya commented 7 months ago
@naluinui Can you adjust the warning box like design? Design Current
Image Image
koonchaya commented 7 months ago

How do I check if the url in export files are signed url?

antonyharfield commented 7 months ago

How do I check if the url in export files are signed url?

If you open a new private/incognito browser window (so that you are not logged in) and paste in the url then it should download a file.

naluinui commented 7 months ago

Do we allow only project owner to request backup?

Yes, only project owner can see this UI

koonchaya commented 7 months ago

This is the error when I tried to open Url from

  1. recordings.csv

Image

  1. templates.csv

Image

  1. pattern_matching_rois.csv

Image

  1. soundscapes.csv? - I don't have soundscape job in my project to check.

Example files

recordings.csv

templates.csv

pattern_matching_rois.csv

Additional

pattern_matching_validations.csv

PM job that has the validation https://staging.arbimon.org/project/create-18-apr-2024/analysis/patternmatching/611

naluinui commented 7 months ago

💡Ideas from Sprint review call

koonchaya commented 7 months ago

Add more export files

antonyharfield commented 7 months ago

Solution to permission problem

https://stackoverflow.com/questions/72552236/generate-s3-presigned-url-in-cross-account-bucket

koonchaya commented 6 months ago

I tested the Admin backup and waited overnight. The request is still in progress. Can you check why it is taking a long time? This is my test project https://staging.arbimon.org/p/the-rooftop/settings

Image

koonchaya commented 6 months ago

@naluinui @antonyharfield The url in

Image

Image

FYI, I just downloaded this file today (8 May)

koonchaya commented 6 months ago

@antonyharfield @LucyDimitrova @naluinui I checked url from recordings.csv which uploaded in 2024 and file timestamp in 2022, I still got the error. The file works ok in the project

Image

My test project: https://staging.arbimon.org/p/create-18-apr-2024/settings Recording id: 7827271 Site id: 11350

LucyDimitrova commented 6 months ago

@koonchaya, the latest changes (adding classification results to the backup and splitting bigger files into smaller ones, with up to 200k rows) are now in staging.

koonchaya commented 6 months ago

Url from recordings and templates are working but url from pattern_matching_rois doesn't work.

koonchaya commented 6 months ago

Export files

Image

pattern_matching_rois Image

pattern_matchings Image

playlist_recordings_001 Image

playlists Image

recording_validations Image

recordings Image

rfm_classifications_001 Image

rfm_models Image

sites Image

soundscapes Image

species Image

templates Image

koonchaya commented 6 months ago

There is no data in some of the export files. I am not sure if that's just from my project or something isn't working. Project: BCI-Panama_2018

Suggestions Export data

LucyDimitrova commented 6 months ago

I've checked the missing data in the export and found the following:

koonchaya commented 6 months ago

@LucyDimitrova Can you add another column to add songtype name in "rfm_classifications_001"? I am going to double check another round of the data in all export files. Summary Backup export: I double check export files in another project. These are things I think we need to do

naluinui commented 6 months ago

FYI team, this feature is now available on production (for internal users who logged in with RFCx email) in the recent release on https://arbimon.org/.

Hope this help us testing it with real data too, before we release to the rest of the users.

carlybatist commented 6 months ago

FYI team, this feature is now available on production (for internal users who logged in with RFCx email) in the recent release on https://arbimon.org/.

Hope this help we testing it with real data too, before we release to the rest of the users.

@koonchaya @naluinui I've requested backup of an Arbimon workshop project on production to test how it goes and see the files, once it's done I'll add the files here for everyone to see and suggest any changes to columns/etc.

naluinui commented 6 months ago

Great! Thanks @carlybatist

LucyDimitrova commented 6 months ago
koonchaya commented 6 months ago

Remark the downloaded file types from url in the following sheet

koonchaya commented 6 months ago

Feedback from @carlybatist https://rfcx.slack.com/archives/C06PD7U65NF/p1716213253364409?thread_ts=1715951951.248849&cid=C06PD7U65NF

here’s the project backup request for that project some suggested changes (mostly changing internal IDs to what they actually show/mean in arbimon):

naluinui commented 6 months ago

Related task: Publish python script to download all files from backup csv

LucyDimitrova commented 6 months ago

The strikethrough items are done ✅:

@antonyharfield, I'm not sure how to proceed with/answer these:

koonchaya commented 6 months ago

@LucyDimitrova @carlybatist I think this might help answering the questions

pattern_matching_rois.csv - how to get the file that the roi is from?

I think user can go to the recordings by using recording id and get the url for the recording from recordings.csv. I think adding the filename in recordings.csv can be easy and useful for user to find the original file.

playlist_recordings.csv - recording/file name

Should we include filename in recordings.csv?

recording_validations.csv - what is the difference between ‘present’ and ‘present_review’ columns?

I believe this is the validation as present where you validate from PM results and visualizer. I am not sure which one is from which page.

templates.csv - how do we get recording/file name?

User can check the original recordings in recordings.csv with recording id.

carlybatist commented 6 months ago

Ok yeah if it's a hassle to include the recording/file name, we can just stick to the recording ID in the templates, playlist, etc. files. But for the recording_validations.csv, we should make those two column headers more clear to identify which each represent

koonchaya commented 6 months ago

@LucyDimitrova The export is missing pattern_matchings.csv image

LucyDimitrova commented 6 months ago

@koonchaya Fixed. Thanks for catching that!

grindarius commented 6 months ago

Released in v1.2.2 🚀

carlybatist commented 6 months ago

Shouldn't we keep it open until we finalize the the python script and documentation? @koonchaya

koonchaya commented 6 months ago

For the project backup, all requirements in this ticket are done. We still have the ticket for python scripts and documentation open. https://github.com/rfcx/arbimon/issues/1943

naluinui commented 6 months ago

Please note we still need to turn the flag to enable this to the external users. (currently only available to RFCx users)

https://github.com/rfcx/arbimon/blob/854e840c1e63efb94961ce229cadb5564626a192/apps/website/.env.production#L17-L18