sckott / pytaxize

python port of taxize (taxonomy toolbelt) for R
https://sckott.github.io/pytaxize/
MIT License
34 stars 13 forks source link

Maybe drop pandas #35

Closed sckott closed 4 years ago

sckott commented 8 years ago

very heavy dependency, and probably more idiomatic python to just return dicts

sckott commented 8 years ago

thoughts @ethanwhite ?

ethanwhite commented 8 years ago

I think just returning dicts is good. Easy enough to convert to dataframes if the format matches what Pandas takes.

sckott commented 8 years ago

thanks! will do then

ryneches commented 7 years ago

Awe. But I like Pandas!

sckott commented 7 years ago

doesn't it seem like a heavy dependency though?

ryneches commented 7 years ago

It is, darn it. Is the use-case of pytaxize without Pandas particularly appealing? As long as it still plays nice with Pandas by default, I suppose it doesn't need to be required per se.

ryneches commented 7 years ago

It also depends on how closely you want it to reflect the R package. If you want it be pretty close, DataFrames are the logical way to make Python behave in an R-ish kind of way.

sckott commented 7 years ago

Indeed, the R client is focused on data.frame outputs, so in that way using pandas makes it more similar to that. How common will it be for the same person to be switching between taxize and pytaxize? Or is the most common case any one person uses one or the other?

Do you think returning dicts is the most idiomatic python? That is, is that what someone would expect to get back by default?

Dropping pandas makes this library simpler and easier to maintain, so that's a + for dropping

ghost commented 7 years ago

Thanks for creating this package, I just came across it today.

I definitely see the case for making the library simpler and easier to maintain but my personal preference would be to maintain the option of getting a pandas dataframe returned if at all possible. I think it is becoming idiomatic, especially in the python data science community, to use pandas dataframes as inputs and returns for functions.

If I can be of assistance in maintaining or testing, please let me know.

sckott commented 7 years ago

@ColinTalbert thanks for your thoughts. Two votes to keep pandas, so i'm willing to keep it. Maybe it makes sense to allow user to say whether they want a pandas dataframe back or a dict?

If I can be of assistance in maintaining or testing, please let me know.

Thanks for the issue #43 - that kind of help is great! using and reporting bugs. Also, any feature requests is great. There are still many things to port over from taxize - so this client is a ways off from being complete at least wrt its R cousin

ghost commented 7 years ago

One option I've seen and liked is including an optional parameter to specify the return format: https://github.com/ulmo-dev/ulmo/blob/0c6337993542f7a46ed9d528ea9046628cd9f8bc/ulmo/nasa/daymet/core.py#L66

If you used this I think you could also set up the pandas import as optional, which might might make installation easier for folks who aren't using pandas (ArcMap users, perhaps)

Colin Talbert GIS Analyst and Developer US Geological Survey

talbertc@usgs.gov

USGS Fort Collins Science Center 2150 Centre Ave. Bldg. C Fort Collins, CO 80526 (970) 226-9425

USGS North Central Climate Science Center (970) 492-4283

Work schedule: Monday - 7:00 - 3:00 (FORT) Tuesday - 7:00 - 3:00 (NC CSC) Wednesday - 7:00 - 3:00 (FORT) Thursday - 7:00 - 5:00 (FORT) Friday - 7:00 - 5:00 (FORT)

On Wed, Oct 26, 2016 at 10:41 AM, Scott Chamberlain < notifications@github.com> wrote:

@ColinTalbert https://github.com/ColinTalbert thanks for your thoughts. Two votes to keep pandas, so i'm willing to keep it. Maybe it makes sense to allow user to say whether they want a pandas dataframe back or a dict?

If I can be of assistance in maintaining or testing, please let me know.

Thanks for the issue #43 https://github.com/sckott/pytaxize/issues/43 - that kind of help is great! using and reporting bugs. Also, any feature requests is great. There are still many things to port over from taxize - so this client is a ways off from being complete at least wrt its R cousin

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sckott/pytaxize/issues/35#issuecomment-256407329, or mute the thread https://github.com/notifications/unsubscribe-auth/ACJURIu18lCaSa_nKQnK8EYkLal8mAbvks5q34LIgaJpZM4Hbdzo .

sckott commented 7 years ago

As a parameter sounds good to me! ✔️

sckott commented 7 years ago

If you used this I think you could also set up the pandas import as optional

how do you do optional imports in a python library?

ghost commented 7 years ago

something along the lines of:

try: import pandas as pd

except ImportError: pd = None

Colin Talbert GIS Analyst and Developer US Geological Survey

talbertc@usgs.gov

USGS Fort Collins Science Center 2150 Centre Ave. Bldg. C Fort Collins, CO 80526 (970) 226-9425

USGS North Central Climate Science Center (970) 492-4283

Work schedule: Monday - 7:00 - 3:00 (FORT) Tuesday - 7:00 - 3:00 (NC CSC) Wednesday - 7:00 - 3:00 (FORT) Thursday - 7:00 - 5:00 (FORT) Friday - 7:00 - 5:00 (FORT)

On Thu, Oct 27, 2016 at 9:56 AM, Scott Chamberlain <notifications@github.com

wrote:

If you used this I think you could also set up the pandas import as optional

how do you do optional imports in a python library?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sckott/pytaxize/issues/35#issuecomment-256687584, or mute the thread https://github.com/notifications/unsubscribe-auth/ACJURDzHLtUqSCCo2P6JxWvSf4L4GAfJks5q4Mm_gaJpZM4Hbdzo .

ghost commented 7 years ago

and then in in code:

if not pd: raise Exception("Pandas need for this functionality")

Colin Talbert GIS Analyst and Developer US Geological Survey

talbertc@usgs.gov

USGS Fort Collins Science Center 2150 Centre Ave. Bldg. C Fort Collins, CO 80526 (970) 226-9425

USGS North Central Climate Science Center (970) 492-4283

Work schedule: Monday - 7:00 - 3:00 (FORT) Tuesday - 7:00 - 3:00 (NC CSC) Wednesday - 7:00 - 3:00 (FORT) Thursday - 7:00 - 5:00 (FORT) Friday - 7:00 - 5:00 (FORT)

On Thu, Oct 27, 2016 at 9:58 AM, Talbert, Colin talbertc@usgs.gov wrote:

something along the lines of:

try: import pandas as pd

except ImportError: pd = None

Colin Talbert GIS Analyst and Developer US Geological Survey

talbertc@usgs.gov

USGS Fort Collins Science Center 2150 Centre Ave. Bldg. C Fort Collins, CO 80526 (970) 226-9425

USGS North Central Climate Science Center (970) 492-4283

Work schedule: Monday - 7:00 - 3:00 (FORT) Tuesday - 7:00 - 3:00 (NC CSC) Wednesday - 7:00 - 3:00 (FORT) Thursday - 7:00 - 5:00 (FORT) Friday - 7:00 - 5:00 (FORT)

On Thu, Oct 27, 2016 at 9:56 AM, Scott Chamberlain < notifications@github.com> wrote:

If you used this I think you could also set up the pandas import as optional

how do you do optional imports in a python library?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sckott/pytaxize/issues/35#issuecomment-256687584, or mute the thread https://github.com/notifications/unsubscribe-auth/ACJURDzHLtUqSCCo2P6JxWvSf4L4GAfJks5q4Mm_gaJpZM4Hbdzo .

sckott commented 7 years ago

ah, okay

ghost commented 7 years ago

I've taken a stab at implementing this, take a look at: https://github.com/ColinTalbert/pytaxize/blob/as_dataframe/pytaxize/itis.py#L9-L13 and: https://github.com/ColinTalbert/pytaxize/blob/as_dataframe/pytaxize/itis.py#L71-L123 for how I was thinking this would look.

There are obviously many functions in the itis module that would need this change, let alone the modules for other services. If this seems like a workable design pattern, I'll see what I can do to add this functionality to the rest of the itis functions, before submitting a pr.

sckott commented 7 years ago

thanks @ColinTalbert - looks good, though can we shorten the param as_dataframe to dataframe, just for brevity sake

i realized that there's at least some, didn't look thoroughly through package, internal munging of data with pandas - which I think should be changed to just munging native python data structures - so we'll need to do that too

sckott commented 7 years ago

don't we also need extra_requires or similar in setup.py ?

ghost commented 7 years ago

Yes, good point. I'll throw that in as well.

Colin Talbert GIS Analyst and Developer US Geological Survey

talbertc@usgs.gov

USGS Fort Collins Science Center 2150 Centre Ave. Bldg. C Fort Collins, CO 80526 (970) 226-9425

USGS North Central Climate Science Center (970) 492-4283

Work schedule: Monday - 7:00 - 3:00 (FORT) Tuesday - 7:00 - 3:00 (NC CSC) Wednesday - 7:00 - 3:00 (FORT) Thursday - 7:00 - 5:00 (FORT) Friday - 7:00 - 5:00 (FORT)

On Fri, Oct 28, 2016 at 12:38 PM, Scott Chamberlain < notifications@github.com> wrote:

don't we also need extra_requires or similar in setup.py ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sckott/pytaxize/issues/35#issuecomment-256996266, or mute the thread https://github.com/notifications/unsubscribe-auth/ACJURLFyDCqG4sJlFgjMfMRuTTOR5eyCks5q4kEPgaJpZM4Hbdzo .

dheerajpai commented 6 years ago

Why is this thread still open? :( Why can't we sort this out? has there been a pull request sent?

sckott commented 6 years ago

@dheerajpai if @ColinTalbert send a PR we can discuss that. if we don't hear from him soon I can go ahead and do it