Change output format to pandas dataframe

ppeloton commented 3 years ago

Hi, thanks for creating this package. In my opinion, it would be more convenient to have the outputs in pandas dataframe format (e.g. similarly as in the yfinance package). Have you thought about that, is there an argument to stick with the current implementation with dictionary outputs? If not, I would be happy to contribute to this project and change outputs to pandas dataframes

FMC17 commented 3 years ago

I love this idea!

TheFou commented 3 years ago

As much as I love using Pandas DataFrames, I don't think this is a good idea, for multiple reasons.

Some people would want to use the raw data, without DataFrames.

As for me, I sometimes like to rearrange / edit data before getting them to a DataFrame which, depending on the type of edit, is sometimes easier on raw data.

Plus, getting raw data allows people to choose what type they want in the DataFrame, which is important, especially regarding floats or decimals : some will want to use floats (with the inevitable base 2 imprecision), some will prefer getting accurate decimal type. If you commit the data directly in the framework, it will most probably be as floats, and you'll lose the ability to keep accurate decimal data. Or the developer will have to implement both, letting the user choose, which leads me to...

...As a final note, I don't think that such a framework is the right place to do any transform on data. This belongs to the logic of the program using the data, not to the framework used to retrieve them.

ppeloton commented 3 years ago

I get your point. What about introducing a boolean parameter and the service returns raw data when given the one value and a dataframe when given the other value? It is your decision, I am just thinking that for some people it is not that easy to work with the raw data. If you have a dataframe, you can export it to Excel with a one-liner, etc. This might increase adoption of the package.

TheFou commented 3 years ago

It is not my decision, I'm not the the author of the package at all ^^ I was just sharing my thoughts on this matter, as a python learning guy. Honestly, I don't think it would make things a lot easier. Pandas itself comes with its own challenges, and it may be one-liner to export to Excel for some as you say, but for others it wouldn't :

issues with float / decimals, as explained earlier,
localization issues (commas for decimals in Europe, date / time formats, etc...).

People still have to know basics of python to use this. So, as said, better to leave that to the program / script using this framework IMHO.

man-c commented 3 years ago

Hi! Since this package is a wrapper around the API, in my opinion should return raw responses with the data. Lists and dictionaries formats are in agreement with the returns of the API. No extra processing of the responses is implemented by the wrapper. You can check how CoinGecko returns data for its endpoints in the API documentation (https://www.coingecko.com/en/api).

I think the best solution is to always keep data in raw format, as returned from the API. This allows developers to choose how to handle them best, based on their needs. Also, keeps debugging simpler, both for the package and the programs that use the package.

Doing extra processing, using DataFrames, would add extra - unneeded, in my opinion - requirements for the package, and would make it more demanding to maintain and use (e.g. if the API changes the responses for some endpoints, the wrapper might be needed to be updated and users will also need to update it). Since there are a lot of endpoints, the package will be required to create proper DataFrames for each one, making sure that it handles properly the additional (optional) parameters. Also, documentation would be more demanding (to include the DataFrames, etc). As it is now, users can work with the CoinGecko API documentation.

This kind of data handling seems to me as a developers' decision. It should better be decided by the developers of a program that uses the wrapper and not by the package itself, in order to be as modular as possible.

Also, converting a raw response from the API to a pandas DataFrame seems simple for most endpoints using functions that the pandas package provides itself (e.g. using pandas.DataFrame.from_dict).

So, i think working with raw data, fits best for the wrapper.

In any case, thank you for your suggestions!

ppeloton commented 3 years ago

Alright, if the aim of the package is to be a simple wrapper around API, that's the way to go. If you change your mind and want to get pandas Dataframes returned, let me know if you need any help (happy to contribute). As already mentioned there are other packages such as yfinance with a similar technical setup (yahoo finance - API returns json/dict format) and the package converts everything into DataFrames.

Closing the ticket for now.

man-c / pycoingecko

Change output format to pandas dataframe #32