Optimize dependencies - Githubissues

ralamosm commented 4 months ago

This is not critical, and can be done slowly: In the future let's try to get rid of pandas and mysql2sqlite. The first dependency is used only when running scripts to extract data from ODS spreadsheets (one run every 2 months at most) while the later is used only to get our prod db in sqlite format for local testing.

Both libs take up to 60mb and +100mb respectively, and we are very limited in terms of resources in our current hosting so let's get rid of them.

To solve the scripts issue let's just use CSV files. Admittedly it won't be as neat as using pandas to walk over that data, but in the end is the same result.

As for mysql2sqlite, we just have to start using mysql locally to mirror the prod environment. To make it easier we can try to dockerize the project so it comes with everything included.

What do you think @tomgranuja ?

tomgranuja commented 4 months ago

This is not critical, and can be done slowly: In the future let's try to get rid of pandas and mysql2sqlite. The first dependency is used only when running scripts to extract data from ODS spreadsheets (one run every 2 months at most) while the later is used only to get our prod db in sqlite format for local testing.

Both libs take up to 60mb and +100mb respectively, and we are very limited in terms of resources in our current hosting so let's get rid of them.

To solve the scripts issue let's just use CSV files. Admittedly it won't be as neat as using pandas to walk over that data, but in the end is the same result.

As for mysql2sqlite, we just have to start using mysql locally to mirror the prod environment. To make it easier we can try to dockerize the project so it comes with everything included.

What do you think @tomgranuja ?

I also have thought about getting rid of pandas and odfpy because the management sript has to for loop through every record in file. So it is better to just use csv (and upload csv data instead of ods) and free the odfpy dependency. The other dependency, mysql2sqlite, is a great way to handle all enrollment (and workshop) data in a single movable file. We can use multiple csv tables though. I understand the need to optimize disk usage so I would agree to take mysql2sqlite out of the dependencies if we can drop data to csv, download and locally populate db.

I haven't use docker, if you think we can pack everything I believe it could be usefull.

tomgranuja commented 2 months ago

@ralamosm I modified add_workshop_from_ods.py to use only csv thus droping dependency from pandas and odfpy. It was easy, I think I can do the same modifications in the other two scripts in order to get rid of those modules dependencies in the project. Need a couple of hours (maybe tomorrow) to modify and commit.

ralamosm commented 2 months ago

Es necesario modificar los otros 2 script? Los usamos solo en la carga inicial...

tomgranuja commented 2 months ago

Es necesario modificar los otros 2 script? Los usamos solo en la carga inicial...

Es probable que no volvamos a usar add_teacher ni add_student durante 2024. Pero en febrero 2025 podrían ser útiles. Arreglé add_teacher y lo probé, se ve bien. Me queda solo add_student y subo los cambios en dos commits, uno para cambiar el contenido de los 3 scripts y otro para cambiarles el nombre (actualmente los nombres de los tres scripts contienen el "...from_ods" ).

tomgranuja commented 2 months ago

@ralamosm Quedaron los tres scripts independientes de pandas/odfpy -> b6c8e17 No toqué el pyproject.toml prefiero que lo manejes tu, o que me guíes para que quede bien.

tomgranuja commented 2 months ago

@ralamosm the other dependency, mysql2sqlite is only for making local mirror of the database? I'm testing the following snippet to use just csv:

#!/usr/bin/env python
"""Print out csv of a cayuman model."""
import csv
import sys
from django.core.management.base import BaseCommand

from django.apps import apps

class Command(BaseCommand):
    help = "Initial teacher load from ods file."

    def add_arguments(self, parser):
        parser.add_argument("model_name")

    def handle(self, *args, **options):
        model = apps.get_model('cayuman', options["model_name"])
        field_names = [f.attname for f in model._meta.fields]
        writer = csv.writer(sys.stdout, quoting=csv.QUOTE_MINIMAL)
        writer.writerow(field_names)
        for o in model.objects.all():
            writer.writerow([getattr(o, f) for f in field_names])

So with a local script we can make database from downloaded csv files? The tricky part is what to do with relations between tables. But at least without database we can pandas-check the students selections and schedules.

ralamosm commented 2 months ago

sí, solo para backup de la db.

Por lo mismo de las relaciones no estoy seguro de los csv. En teoria se puede pero el orden no me queda claro. Me suena a que seria mas facil dockerizar el proyecto. Usar docker-compose para instalar una imagen python y una mysql y asi poder usar directamente los dump de mysqldump desde el servidor.

O nisiquiera dockerizar... basta con instalar mysql local con apt-get install mysql, configurar un user y db, cargar un dump de la db en el mysql local y cambiar el archivo de conf local para apuntar a ese servidor mysql local. Es rapidito.

ralamosm commented 2 months ago

https://github.com/tomgranuja/CayumanDjango/tree/load-with-csv fue movido a prod. El espacio ocupado se redujo en 130mb aprox. Gracias @tomgranuja !

tomgranuja commented 2 months ago

O nisiquiera dockerizar... basta con instalar mysql local con apt-get install mysql, configurar un user y db, cargar un dump de la db en el mysql local y cambiar el archivo de conf local para apuntar a ese servidor mysql local. Es rapidito.

@ralamosm puedo confirmar que es posible usar un mysqldump en producción y cargar el dump en una mariadb local, y al correr el servidor local se observan los mismos datos que en producción.

¡Así que podemos quitar mysql2sqlite de las dependencias!

tomgranuja commented 2 months ago

No sería necesario dar soporte para volver los datos a sqlite en producción.

ralamosm commented 2 months ago

Ahh excelente! Lo voy a sacar entonces. Gracias por probar!

tomgranuja / CayumanDjango

Optimize dependencies #42