marcosschroh / dataclasses-avroschema

Generate avro schemas from python dataclasses, Pydantic models and Faust Records. Code generation from avro schemas. Serialize/Deserialize python instances with avro schemas.
https://marcosschroh.github.io/dataclasses-avroschema/
MIT License
218 stars 67 forks source link

Request: reduce dependency bloat #292

Closed joaoe closed 1 year ago

joaoe commented 1 year ago

Is your feature request related to a problem? Please describe. Hi ! In my project I'm working to reduce dependency bloat, import times, install times, package size, etc. dataclasses-avroschema happens to be a culprit. After running pipdeptree it shows a lot of pulled dependencies which are of no use for our project and yet seem to be of little use for dataclasses-avroschema.

I just went through the dependency list in the [tool.poetry.dependencies] of pyproject.toml and have the following comments:

  1. fastavro, seem this is pretty much required
  2. inflect, used in a single place as p.singular_noun(name)
  3. pytz, use once to get the UTC timezone and then in tests
  4. dacite, this seem required
  5. faker, used in many places in fields.py
  6. stringcase, seems important as well
  7. pydantic, already optional
  8. dc-avro, already optional
  9. faust-streaming, marked as required dependency even though the code has a fallback if it is not importable

Describe the solution you'd like

  1. fastavro, keep
  2. inflect, possibly mark as optional dependency ? And add fallback in code when it is not importable.
  3. pytz, I understand pretty much every project on pypi.org imports pytz, but this project as a library should include it only if strictly necessary 3.1. To access the utc timezone object just use datetime.UTC from Python's stdlib. 3.2. Import it as test dependency for the other use.
  4. dacite, keep
  5. faker is highly suspicious and seems like test code. Moreover, is it really necessary to have random example data for those fake functions ? Could this be instead either an optional or test dependency, and access faker lazily with a fallback if it is not imported/installed ? Faker is specially annoying because it just spams logging with messages about missing locales all the time.
  6. stringcase, keep
  7. faust-streaming, this should be marked as an optional dependency. This is actually the biggest source of bloat.

Describe alternatives you've considered None.

marcosschroh commented 1 year ago

Hi @joaoe

I think it is a good idea to make this package lighter.

The other dependencies are really important, so we need to keep them

joaoe commented 1 year ago

Hi, thank you for your attention :) I though of making a small draft PR, but these kind of changes tend to be very opinionated, so I don't have the bandwidth to provide a patch that would be shred to bits :p

But for now, could you please just mark faust as optional and thing about the rest after ? That would in the immediate term help a lot with import bloat. Cheers.

marcosschroh commented 1 year ago

@joaoe

I have created a PR to fix the dependencies. faust and pydantic were not optional at all but now they will. Also, I have removed pytz as dependency.

The dependencies for now on are: dacite, Faker, fastavro, Inflector, python-dateutil, six and stringcase