rsyi / whale

🐳 The stupidly simple CLI workspace for your data warehouse.
https://rsyi.gitbook.io/whale
GNU General Public License v3.0
725 stars 39 forks source link

Generation of Html Documentation #82

Open rubenssoto opened 3 years ago

rubenssoto commented 3 years ago

Hello :)

There is a great software that I use in my company called Great Expectations, its a tool to check data quality. They have a feature called data docs, it is HTML documentation about data quality checks, I host all html in an s3 bucket and all company could access.

https://greatexpectations.io/

Whale could have a feature like this, simple html with all table documentation and with some simple fields to search data.

thank you

rubenssoto commented 3 years ago

https://docs.greatexpectations.io/en/latest/reference/core_concepts/data_docs.html

rsyi commented 3 years ago

Hm I'll look into how feasible this might be in a low-effort way!

If the goal is just to make a basic interface available to others, I recently discovered gotty, which allows you to serve terminal apps on the web. It basically just lets users access the whale CLI from your browser (and it seems to support concurrent usage). I did some basic tests and it seems to work pretty nicely. If this sort of thing is sufficient, I can write up some quick docs. 😛

I'll look into rendering options as well, but until I/someone can get around to this, here are a few other options (@rubenssoto I think I mentioned these to you, so I'm guessing they're probably not satisfactory, but listing them here in case others are interested 😉 ):

(I'll start learning react in the meantime 😄 )

rubenssoto commented 3 years ago

No problem @rsyi , I will try to use Git for now until data catalog interface is ready 👍 I like Amundsen but is much to take care, my team is only 3 people our goal is to make things simple and automatic.

I think that you already registered me in a beta list, rubens.soto@ze.delivery.

I have some suggestion if make sense, please tell me, I will create an issue for it.

1 - Today all tables stay on same directory, so I think it could be more organized if had an option to create one directory for database. 2 - I don't if another sources has, but glue has location information, and it is a good info for example to people know table locality in datalake.

rsyi commented 3 years ago

Ah didn't know glue had additional info! Yeah both of those suggestions sound feasible. Open some issues and I'll take a look :)