nlesc-sigs / data-sig

Linked data, data & modeling SIG
Other
5 stars 3 forks source link

Data formats and databases for SPOT #23

Closed fdiblen closed 4 years ago

fdiblen commented 6 years ago

I am planning to make some changes in SPOT in order to support most of the data formats and databases. I need an overview of the data types used in our projects.

The ones I can think of are:

data formats: CSV json geojson npy hdf5 netcdf xml xls

databases: Postresql SQLite MySQL? MongoDB? Redis? CouchDB? HBase? Hive?

?: not sure if we really use it in any projects *: not sure if we can support it easily

arnikz commented 6 years ago

The current list could be extended with:

  1. RDF-based serializations (e.g. Turtle, sorry, not Python) or JSON-LD
  2. Virtuoso RDBMS/RDF Store

However, one needs to know the schema to subset data and/or enable meaningful data import into SPOT, I think.

egpbos commented 6 years ago

ElasticSearch / Solr / Lucene

fdiblen commented 6 years ago

@arnikz @egpbos it would be nice if you could name the projects using these databases.

egpbos commented 6 years ago

I've used ES in the pidimehs project for analyzing newspaper texts.

arnikz commented 6 years ago

I've used Virtuoso in the ODEX4all (completed), candYgene, EOSCpilot-LOFAR and Googling The Cancer Genome projects. In addition, we use SQLite and Solr in candYgene.

romulogoncalves commented 5 years ago

@fdiblen would it be ok if you join the next DataSIG and share with us what is your plan? Like letting us know if data formats and Database management systems you will support?

fdiblen commented 5 years ago

@romulogoncalves Sure. It would be nice to discuss this and ask for your opinion. I may join you for the meeting after the sprint.

romulogoncalves commented 5 years ago

@fdiblen are you aware that you will have to talk about this today at the Data-SIG session? Not sure if you are in the data-sig mailing list.

fdiblen commented 5 years ago

@romulogoncalves thanks for the reminder. Yes, I am in the mailing list.

sverhoeven commented 5 years ago

From the Hadoop ecosystem you have the ORC and Parquet formats which can store dataframes. Would be nice to have support for these as well.

sverhoeven commented 5 years ago

https://wiki.postgresql.org/wiki/Foreign_data_wrappers can wrap most of the suggested formats in PostgreSQL

c-martinez commented 5 years ago

@fdiblen - can you fill a table with data formats & applicable projects ?

romulogoncalves commented 5 years ago

@fdiblen can we visualize graphs and linked data? Maybe another session about it with some use cases.

c-martinez commented 4 years ago

@fdiblen -- This issue is quite old by now. I will close it, but please open it again if it is still necessary.

fdiblen commented 4 years ago

We had a few discussions in SIG meetings but I dont remember why we decided to keep this open.