ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.26k stars 595 forks source link

docs: Create and Populate How-To Guides #4385

Closed p-a-a-a-trick closed 11 months ago

p-a-a-a-trick commented 2 years ago

Create+Populate How-To Guides

There is a lack of how to guides on the website. Let's make a list of them and start populating a useful section for code schematics.

I hope to keep this issue up as a rolling list of How to Guides, so if you stumble on this and your need isn't included in the docs or this list, then comment your request here!

v3.2

v4.0

v4.1+

Let me know if you have any ideas and I'll add them to the list.

cpcloud commented 2 years ago

I think the things in the wiki are useful bits of information to put in the how-tos.

ogrisel commented 2 years ago

Here are some more ideas:

For the first item about CSV loading, I don't know if there is a standard way to use the COPY statements of popular databases from ibis. Personally I used a raw_sql command with duckdb to be able to specify a custom date format in a recent experiment.

The second is about event-based data and should probably cross-reference #4402 (about ffil).

ogrisel commented 2 years ago

@p-a-a-a-trick I made a typo in the word "sessionize".

p-a-a-a-trick commented 2 years ago

@p-a-a-a-trick I made a typo in the word "sessionize".

Got it; thanks @ogrisel! I'll probably knock a few more of these out in the next couple of months. Let me know if you have any more ideas—I can't guarantee they'll get in before the next release but I hope to have most (if not all) of these in before then.

I hope to keep this issue as a rolling list of requested How To guides, not sure if it'll ever be closed.

ogrisel commented 2 years ago

For reference the 3 howto ideas above stem for my first-time experience with ibis and duckdb which is quite pleasant so far:

https://github.com/soda-inria/survival-analysis-benchmark/tree/main/datasets/kkbox_churn

jcmkk3 commented 2 years ago

Below is a list of some resources that I think are good examples to better learn tools in the data ecosystem. Maybe some parts of these could be translated to use Ibis. Sometimes one of the hardest parts is to just figure out a good narrative flow or interesting example use case when writing tutorials.

Many of these examples have other elements outside of just data manipulation like visualization or maybe some stats/machine learning. It could get a little bit tricky to choose which other libraries to feature, but I do think that it is useful to see how different tools interact.

ogrisel commented 2 years ago

Another good related post on how to sessionize data (using polars):

The dataset is good (not too big, not too small and realistic) and the multistep task is a good opportunity to show how to factorize code with functions chained via the .pipe method.

NickCrews commented 1 year ago

I'll throw in data import/export. I find guidance on this topic currently pretty limited in the docs, but this is a step that every single user is going to do. Here is a nice example from vaex. There are other good examples of How-tos there as well.

lostmygithubaccount commented 11 months ago

closing this per recent docs updates, how-tos available on the site: https://ibis-project.org/how-to/configure/basics