Open ralamosm opened 4 months ago
This is not critical, and can be done slowly: In the future let's try to get rid of
pandas
andmysql2sqlite
. The first dependency is used only when running scripts to extract data from ODS spreadsheets (one run every 2 months at most) while the later is used only to get our prod db in sqlite format for local testing.Both libs take up to 60mb and +100mb respectively, and we are very limited in terms of resources in our current hosting so let's get rid of them.
To solve the scripts issue let's just use CSV files. Admittedly it won't be as neat as using pandas to walk over that data, but in the end is the same result.
As for mysql2sqlite, we just have to start using mysql locally to mirror the prod environment. To make it easier we can try to dockerize the project so it comes with everything included.
What do you think @tomgranuja ?
I also have thought about getting rid of pandas and odfpy because the management sript has to for loop through every record in file. So it is better to just use csv (and upload csv data instead of ods) and free the odfpy dependency. The other dependency, mysql2sqlite, is a great way to handle all enrollment (and workshop) data in a single movable file. We can use multiple csv tables though. I understand the need to optimize disk usage so I would agree to take mysql2sqlite out of the dependencies if we can drop data to csv, download and locally populate db.
I haven't use docker, if you think we can pack everything I believe it could be usefull.
@ralamosm I modified add_workshop_from_ods.py
to use only csv
thus droping dependency from pandas and odfpy. It was easy, I think I can do the same modifications in the other two scripts in order to get rid of those modules dependencies in the project. Need a couple of hours (maybe tomorrow) to modify and commit.
Es necesario modificar los otros 2 script? Los usamos solo en la carga inicial...
Es necesario modificar los otros 2 script? Los usamos solo en la carga inicial...
Es probable que no volvamos a usar add_teacher ni add_student durante 2024. Pero en febrero 2025 podrían ser útiles. Arreglé add_teacher y lo probé, se ve bien. Me queda solo add_student y subo los cambios en dos commits, uno para cambiar el contenido de los 3 scripts y otro para cambiarles el nombre (actualmente los nombres de los tres scripts contienen el "...from_ods" ).
@ralamosm Quedaron los tres scripts independientes de pandas/odfpy -> b6c8e17
No toqué el pyproject.toml
prefiero que lo manejes tu, o que me guíes para que quede bien.
@ralamosm the other dependency, mysql2sqlite
is only for making local mirror of the database?
I'm testing the following snippet to use just csv:
#!/usr/bin/env python
"""Print out csv of a cayuman model."""
import csv
import sys
from django.core.management.base import BaseCommand
from django.apps import apps
class Command(BaseCommand):
help = "Initial teacher load from ods file."
def add_arguments(self, parser):
parser.add_argument("model_name")
def handle(self, *args, **options):
model = apps.get_model('cayuman', options["model_name"])
field_names = [f.attname for f in model._meta.fields]
writer = csv.writer(sys.stdout, quoting=csv.QUOTE_MINIMAL)
writer.writerow(field_names)
for o in model.objects.all():
writer.writerow([getattr(o, f) for f in field_names])
So with a local script we can make database from downloaded csv files? The tricky part is what to do with relations between tables. But at least without database we can pandas-check the students selections and schedules.
sí, solo para backup de la db.
Por lo mismo de las relaciones no estoy seguro de los csv. En teoria se puede pero el orden no me queda claro. Me suena a que seria mas facil dockerizar el proyecto. Usar docker-compose para instalar una imagen python y una mysql y asi poder usar directamente los dump de mysqldump desde el servidor.
O nisiquiera dockerizar... basta con instalar mysql local con apt-get install mysql
, configurar un user y db, cargar un dump de la db en el mysql local y cambiar el archivo de conf local para apuntar a ese servidor mysql local. Es rapidito.
https://github.com/tomgranuja/CayumanDjango/tree/load-with-csv fue movido a prod. El espacio ocupado se redujo en 130mb aprox. Gracias @tomgranuja !
O nisiquiera dockerizar... basta con instalar mysql local con
apt-get install mysql
, configurar un user y db, cargar un dump de la db en el mysql local y cambiar el archivo de conf local para apuntar a ese servidor mysql local. Es rapidito.
@ralamosm puedo confirmar que es posible usar un mysqldump en producción y cargar el dump en una mariadb local, y al correr el servidor local se observan los mismos datos que en producción.
¡Así que podemos quitar mysql2sqlite
de las dependencias!
No sería necesario dar soporte para volver los datos a sqlite en producción.
Ahh excelente! Lo voy a sacar entonces. Gracias por probar!
This is not critical, and can be done slowly: In the future let's try to get rid of
pandas
andmysql2sqlite
. The first dependency is used only when running scripts to extract data from ODS spreadsheets (one run every 2 months at most) while the later is used only to get our prod db in sqlite format for local testing.Both libs take up to 60mb and +100mb respectively, and we are very limited in terms of resources in our current hosting so let's get rid of them.
To solve the scripts issue let's just use CSV files. Admittedly it won't be as neat as using pandas to walk over that data, but in the end is the same result.
As for mysql2sqlite, we just have to start using mysql locally to mirror the prod environment. To make it easier we can try to dockerize the project so it comes with everything included.
What do you think @tomgranuja ?