Open echatzikyriakidis opened 3 years ago
arctic is for time series data, this doesnt seem to be time series data
@bmoscon I see.
How can I store large Pandas dataframes in a MongoDB?
i dont know, export the columns to dictionaries and store them that way?
@bmoscon I can convert a dataframe to json documents and store them in a collection but I don't know if this performs fast when reading thousands of documents. I might need to check it.
Hi all!
I am facing a problem and I need help from someone having experience in arctic.
I tried first antarctic to store Pandas dataframes but it stores a dataframe in a single document. However, errors are generated because of 16MB document limitation problem.
I think that arctic will solve my problem but I don't know what Store to use for my use case.
So, here is my use case:
Every 3 days I run a program that creates 3 Pandas dataframes for multiple projects.
So, each project has 3 dataframes.
In a second program a user selects one project and I want to have access to its 3 dataframes.
What Store to use? Is this a correct usage? :
db = Arctic('localhost')
db.initialize_library('projects')
projects_library = db['projects']
projects_library.write('project-1-dataframe-1', df1, metadata={'run_date': date1}) projects_library.write('project-1-dataframe-2', df2, metadata={'run_date': date1}) projects_library.write('project-1-dataframe-3', df3, metadata={'run_date': date1})
projects_library.write('project-2-dataframe-1', df4, metadata={'run_date': date2}) projects_library.write('project-2-dataframe-2', df5, metadata={'run_date': date2}) projects_library.write('project-2-dataframe-3', df6, metadata={'run_date': date2})
project1_df1 = finance_library.read('project-1-dataframe-1').data
Also, I don't think that I will need access to data of previous runs. Only I need latest data of a project.
How can I do it optimally?
Thank you!