halo-db / storymap

Story mapping
Creative Commons Zero v1.0 Universal
3 stars 0 forks source link

Online data analysis #50

Open d70-t opened 3 years ago

d70-t commented 3 years ago

As a user of large datasets with limited internet connectivity, I want to run my data analyses online ("in the cloud") without the need to download files to my computer. An example for this possibility would be the CDS Toolbox

joerg-halo commented 3 years ago

I put the "big" label, because I understand that therefore the HALO database needs to be a primary database. Is this correct? The issue regarding the primary database is already raised in #50 and #18.

rico-hengst commented 3 years ago

Yes, this use case requires that primary data and metadata are managed by the data base.

d70-t commented 3 years ago

I disagree in this point. The system which runs the analysis could well be at a completely different place than the data itself. The only requirement is that the computing system is able to reference data from the database and is able to access data via a sufficiently fast network connection (which might be the internet). It might be beneficial if the data is provided in a form which allows for efficient subsetting.

joerg-halo commented 3 years ago

Okay, I get it now - thanks. So, I remove the "big" label, because "reference data from the database" should be covered by #32 or #20 and "able to access data" by #48. Right?

d70-t commented 3 years ago

Yes, probably this is at least not-so-big and probably the datacenter which provides the online compute service will be a different hosting organization than where the HALO-DB is located. The big parts on HALO-DB is to think about this use case while covering the other cases you have mentioned.