sfbrigade / datasci-sba

Solving problems with the Small Business Administration
10 stars 18 forks source link

adding script to export table and column comments to excel #24

Closed gregboyer closed 7 years ago

gregboyer commented 7 years ago

Exports column and table definitions to excel. Creates a better and faster end-product than querying postures anytime you need info. Also highlights overall progress on definitions.

VincentLa14 commented 7 years ago

Very interesting idea! Couple comments:

  1. I would put this code in datasci-sba/notebooks instead of in the pipeline directory because I think of pipeline tasks as tasks that generally interact/write directly to the database. However, I could see this fitting into a pipeline task if you just convert this from a jupyter notebook to a python file (it's easier to execute python files than jupyter notebooks on command line, although I'm sure it's still possible to execute jupyter notebooks)

  2. Let's keep this code, but another consideration is not everyone has Excel. In particular, Mac products don't come with excel automatically. Even if there are other free spreadsheet programs that can open up Excel, it still seems like one additional step to viewing awesome documentation. One alternative I can think of off the top of my head is to write the pandas dataframe to a Markdown file (maybe called it "data_dictionary.md"). We can then put the Markdown file in the root folder of the repository, and GitHub would automatically render it. See some of the possible solutions and screenshots posted here to see what that would look like: https://stackoverflow.com/questions/33181846/programmatically-convert-pandas-dataframe-to-markdown-table

I'm personally OK with the merge as long as we do one of the two options listed in (1) above. For (2) I'm not 100% sure what I said would add substantial value, so happy to hear feedback on what you and others think.

@jpspeng do you have thoughts? As one of our newest members you might be closer to reading the documentation. What would help you get up to speed faster?

gregboyer commented 7 years ago

Thanks Vince I was going to ask you about location and format. Markdown is fine as it can provide diffs if needed, just worried about readability. I'll see what I can do and, if needed, I can export to csv as well.

On Jul 18, 2017, at 9:34 AM, VincentLa14 notifications@github.com wrote:

Very interesting idea! Couple comments:

1.

I would put this code in datasci-sba/notebooks https://github.com/sfbrigade/datasci-sba/tree/master/notebooks instead of in the pipeline directory because I think of pipeline tasks as tasks that generally interact/write directly to the database. However, I could see this fitting into a pipeline task if you just convert this from a jupyter notebook to a python file (it's easier to execute python files than jupyter notebooks on command line, although I'm sure it's still possible to execute jupyter notebooks) 2.

Let's keep this code, but another consideration is not everyone has Excel. In particular, Mac products don't come with excel automatically. Even if there are other free spreadsheet programs that can open up Excel, it still seems like one additional step to viewing awesome documentation. One alternative I can think of off the top of my head is to write the pandas dataframe to a Markdown file (maybe called it "data_dictionary.md"). We can then put the Markdown file in the root folder of the repository, and GitHub would automatically render it. See some of the possible solutions and screenshots posted here to see what that would look like: https://stackoverflow.com/questions/33181846/programmatically-convert-pandas-dataframe-to-markdown-table

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sfbrigade/datasci-sba/pull/24#issuecomment-316120931, or mute the thread https://github.com/notifications/unsubscribe-auth/AcYKrxJVV_czXYsKwV9zIU5QaVhtEornks5sPN3jgaJpZM4Oa0Yq .

VincentLa14 commented 7 years ago

@fiascojazz what do you mean by "provide diffs if needed"