Closed TyberiusPrime closed 5 years ago
To be fair, we only export 120 symbols, the other 9 start with __, which cleans up the above table a bit.
type | count |
---|---|
<class 'function'> | 55 |
<class 'dfply.base.pipe'> | 40 |
<class 'module'> | 16 |
<class 'type'> | 5 |
<class 'str'> | 2 |
<class 'dfply.base.Intention'> | 1 |
<class 'pandas.core.frame.DataFrame'> | 1 |
Still we could at least drop the modules, strings, possibly the diamonds dataset (plotnine has it as well) from the main export?
Not sure if this helps or just puts more noice. I just started working with this library for some days (it's excellent by the way) and what I usually do is the following:
import dfply as dp
from dfply import X
For the rest of functions I simply use:
table >> dp.select(X.variable)
table >> dp.mutate(column = X.something)
For me it's actually quite natural since it's just the opposite symbol to what I used to use with pandas (dp instead of pd).
I got so unhappy about the state of python dplyr clones, I wrote my own: https://github.com/TyberiusPrime/dppd
I've also written a comparison / rosetta stone for the python dplyr clones: https://dppd.readthedocs.io/en/latest/comparisons.html
dppd
looks cool, though I've only given it a cursory look. I'll have to go over the code in more detail but it looks like a pretty cool take on the NSE-in-python problem.
Perhaps one way to resolve the from dfply import *
issue is just to explicitly list everything in the respective __init__.py
files rather than having the import *
statements. That way, when the user imports everything it should only pull the dfply
-specific classes and functions.
Is that the crux of the issue here or am I misunderstanding the complaint?
OK @TyberiusPrime so I made a new branch called import-fixes
that you can checkout which removes the * import
stuff from the __init__.py
file. I'm not sure what you're using to profile the imports to namespace, but (if you still care) let me know if this branch resolves your namespace issue and i'll merge it into master.
Those fixes will work nicely.
My 'profiling' was along these lines...
import dfply, collections
c = collections.Counter()
for d in dir(dfply):
t = type(getattr(dfply, d))
c[t] += 1
print(c)
There's an universal (and justified) dislike in the python community for * imports. Now I admit that dfply (great work btw) is a pain without it.
But, it currently has a bunch of things in the user-importable namespace that we could possibly clean up.
A quick accounting of the 129 exports from dfply by type:
where presumably only the dfply.base.pipe and a subset of the functions are 'verbs'.
My suggesting would be to introduce to additional namespaces
and update the examples to use from
dfply.verbs import *
instead offrom dfply import *
This way we would a) not break anyones code and b) have a clean, 'non polluting' module that users can import.
What do you think?