mitchelllisle / sparkdantic

✨ A Pydantic to PySpark schema library
https://mitchelllisle.github.io/sparkdantic/
MIT License
45 stars 9 forks source link

License problems? #231

Closed SemyonSinchenko closed 3 days ago

SemyonSinchenko commented 5 months ago

Thank you again for that nice library! The overall idea looks very nice!

I was checking the code and found that sparkdantic depends of dbldatagen. But that library is distributed under commercial Databricks License. Are terms of dbldatagen license applied to the sparkdantic too? In other words, can I use sparkdantic outside of Databricks Platform?

mitchelllisle commented 5 months ago

Hi @SemyonSinchenko You raise a good point. I am no lawyer but I agree with your conclusion that this library would be bound by the same terms as dbltdatagen.

Initially I made this for a personal project but since this library is starting to gain a small number of stars and I would hate for it to be useless for others based off their usage of Databricks, I'll look at replacing the dbltdatagen library with something more permissive. The Faker library uses MIT so its probably a better fit.

The API for generating data is likely to change since the initial once was designed around the API for dbldatagen but will do my best to keep the feature set in parity.

Thanks for being thorough in raising these licence concerns.

mitchstockdale commented 1 week ago

Hey @mitchelllisle, would you consider removing the data gen capability from this package altogether. IMO there are a few good reasons for this:

Regarding the last point, polyfactory and/or Faker (as you suggest) are most common. Whether or not dbldatagen is replaced by one of these (or something else), the data gen functionality could be split out into another package in the future as another consideration.

I think removing the data gen component will make this package viable for its first major version release 🥳

mitchelllisle commented 6 days ago

@mitchstockdale I'm not opposed to this at all. Originally I wanted to bundle both of these features since thats what I was after for my project - but I think in hindsight it makes sense to either make this feature optional by install extras (I.E by pip install sparkdantic[datagen] or by creating another repo as you suggested.

Either way I think v1.0.0 could ship without it to begin with. I'll leave this open for a week or so and if there are no objections I can release with this removed. It will also resolve @SemyonSinchenko licence concerns which I haven't had time to resolve this year. Removal is much quicker.

Thanks for the suggestion

Tagging some other contributions for any feedback they might have @dan1elt0m @chidifrank

dan1elt0m commented 6 days ago

Hey @mitchelllisle, Thanks for tagging. Frank and I have already had a discussion on this topic, and we believe that the datagen feature doesn't make much sense for our needs. Instead, we recommend using Polyfactory, which offers more flexibility and features. FYI, Sparkdantic and Polyfactory work really well together. Have applied it to many complex models without any problems.

chidifrank commented 6 days ago

Hey @mitchelllisle, Thanks for tagging. Frank and I have already had a discussion on this topic, and we believe that the datagen feature doesn't make much sense for our needs. Instead, we recommend using Polyfactory, which offers more flexibility and features. FYI, Sparkdantic and Polyfactory work really well together. Have applied it to many complex models without any problems.

Can confirm.

especially when you want to generate data for more complex types like arrays the datagen library of databricks is not very intuitive

ref: https://databrickslabs.github.io/dbldatagen/public_docs/generating_json_data.html#generating-complex-column-data

With polyfactory + sparkdantic

@mitchelllisle

mitchelllisle commented 5 days ago

Great - sounds like we're all in agreement. Happy to remove it and do a v1.0.0 release this weekend

mitchelllisle commented 5 days ago

New release has been cut https://github.com/mitchelllisle/sparkdantic/releases/tag/v1.0.0

Thanks for the time and effort you've all put in on this little library 😄

mitchelllisle commented 3 days ago

@SemyonSinchenko I'll close this issue off now since I believe the licence is more aligned now. Let me know if you have any other concerns