miguelgrinberg / microblog

The microblogging application developed in my Flask Mega-Tutorial series. This version maps to the 2024 Edition of the tutorial.
http://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
MIT License
4.56k stars 1.65k forks source link

Exporting a huge number of database records #366

Closed Tes3awy closed 9 months ago

Tes3awy commented 9 months ago

Hi @miguelgrinberg,

I am trying to export a huge dataset of around 800,000 database records that will be extended to 1 million records any time soon.

https://github.com/miguelgrinberg/microblog/blob/b0d5e8470eb28898d27ce08264908298b5e838fd/app/tasks.py#L28-L56

I have the same function, export_posts. But, as you have guessed already, it is taking way to much time to export the database records.

Do you have any suggestions for speeding up the export process?

Also, how can I make the export downloadable from the browser directly instead of being sent via email? I am exporting the data to a CSV file instead of a JSON file.

miguelgrinberg commented 9 months ago

For such a large export anything you do in Python is going to be painfully slow. The most performant option is to use an export tool offered by your database.

You can write the exported file to a designated directory, maybe using the user ID as filename. Then you can add a download route under authentication that uses send_file to return this file as a download.

Tes3awy commented 9 months ago

Can you help me with an example of your suggestion?

miguelgrinberg commented 9 months ago

I don't have anything to share. For serving the csv file you can see the send_file() docs from Flask. For generating a CSV efficiently you have to look at what options your database provides. Postgres has a pg_dump command, and also a COPY statement in their SQL implementation. There is a mysqldump as well for MySQL.