iodepo / oih-ui

source code for the Ocean InfoHub (OIH) website
https://oceaninfohub.org/
MIT License
5 stars 4 forks source link

Allow bulk export of search results #65

Open pbuttigieg opened 1 year ago

pbuttigieg commented 1 year ago
pbuttigieg commented 1 year ago

Consider also offering a bulk download of the entire OIH corpus in the above formats. This may be redundant with the release graph assets.

@fils notes that we already have SOLR functions to support this.

emarzini commented 10 months ago

@Lucy-Scott @arnounesco @pbuttigieg @jmckenna @fils

Hello, we have some concerns regarding this issue.

First of all we have to flag that export task could be heavy in terms of waiting time for all the export types you requested (JSON-LD, CSV, PDF). We did a try to retrieve via API all the 42k of Documents waiting for about 2-3 mins. This means we have to find a solution to do it asynchronously in order to not block the user navigation.

Another doubt we have it is if the functionalities to export all OIH data corpus or all the data for a specific tab (Document, Expert, etc.) could be reasonable in term of data ownership? Explaining better... is it OK that a user can export what they want? No worries about that someone can clone OIH in some way?

Anyway we have also some specific question for each export type: General notice is each tab (Document, Expert, etc.) results has different structure and the structure is not cabled on the frontend side but it is present a logic exploiting the data coming from the API.

1) JSON-LD. Can we assume to put in each JSON item (referred to the single result) all the data coming from the API (the same showed if the user click on "View JSONLD source" or you want to export a subset of information? 2) CSV. The main problem of the CSV is the table structured. This means we have to identify the column header for each information we have. If we can consider the information that the UI currently is showing (txt_INFORMATION from the API response) + name + description (Person excluded), it is pretty ok. If you want to include other information (or all information) we have in the response API json, it is a messy, because the information sometimes are multiple and nested. And continuing to be more dynamic as possible we can't assume anything on the frontend side. Note that the bulk export corpus is not suitable for CSV format, due the column structure

3) PDF. We are imaging to put in the exported PDF the fields that the user can see on the UI (different for each tab). Is it ok?

PS: @pbuttigieg @fils What do you mean on previous comment "This may be redundant with the release graph assets." and "we already have SOLR functions to support this."

What are your thoughts on this?

emarzini commented 7 months ago

Hello everyone,

We are currently facing an issue implementing the export functionality. As discussed, we are attempting to provide the option to export ALL results in the dedicated format.

However, attempting this on the frontend side has proven to be too resource-intensive.

Potential solutions are as follows:

export

emarzini commented 6 months ago

@jmckenna @fils according to @pbuttigieg we need to manage this feature via backend, exploiting spark endpoint or something similar. The functionality is too heavy for the frontend. Are you able to give the final result to the frontend?