man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.04k stars 584 forks source link

How can I store the data of my use case? #882

Open echatzikyriakidis opened 3 years ago

echatzikyriakidis commented 3 years ago

Hi all!

I am facing a problem and I need help from someone having experience in arctic.

I tried first antarctic to store Pandas dataframes but it stores a dataframe in a single document. However, errors are generated because of 16MB document limitation problem.

I think that arctic will solve my problem but I don't know what Store to use for my use case.

So, here is my use case:

Every 3 days I run a program that creates 3 Pandas dataframes for multiple projects.

So, each project has 3 dataframes.

In a second program a user selects one project and I want to have access to its 3 dataframes.

What Store to use? Is this a correct usage? :

db = Arctic('localhost')

db.initialize_library('projects')

projects_library = db['projects']

projects_library.write('project-1-dataframe-1', df1, metadata={'run_date': date1}) projects_library.write('project-1-dataframe-2', df2, metadata={'run_date': date1}) projects_library.write('project-1-dataframe-3', df3, metadata={'run_date': date1})

projects_library.write('project-2-dataframe-1', df4, metadata={'run_date': date2}) projects_library.write('project-2-dataframe-2', df5, metadata={'run_date': date2}) projects_library.write('project-2-dataframe-3', df6, metadata={'run_date': date2})

project1_df1 = finance_library.read('project-1-dataframe-1').data

Also, I don't think that I will need access to data of previous runs. Only I need latest data of a project.

How can I do it optimally?

Thank you!

bmoscon commented 3 years ago

arctic is for time series data, this doesnt seem to be time series data

echatzikyriakidis commented 3 years ago

@bmoscon I see.

How can I store large Pandas dataframes in a MongoDB?

bmoscon commented 3 years ago

i dont know, export the columns to dictionaries and store them that way?

echatzikyriakidis commented 3 years ago

@bmoscon I can convert a dataframe to json documents and store them in a collection but I don't know if this performs fast when reading thousands of documents. I might need to check it.