Closed p-a-a-a-trick closed 11 months ago
I think the things in the wiki are useful bits of information to put in the how-tos.
Here are some more ideas:
How to create a table with a user-defined schema and efficiently load a large CSV file into it (handling common annoying things like the choice of the separator (and maybe quotes) and custom date / time formats).
How to sessionize a log of events based on the elapsed time between successive transactions for a give user identifier.
How to efficiently sample random rows from a large table (or expr
) to quickly compute approximate descriptive statistics such as quantiles.
For the first item about CSV loading, I don't know if there is a standard way to use the COPY
statements of popular databases from ibis. Personally I used a raw_sql
command with duckdb to be able to specify a custom date format in a recent experiment.
The second is about event-based data and should probably cross-reference #4402 (about ffil
).
@p-a-a-a-trick I made a typo in the word "sessionize".
@p-a-a-a-trick I made a typo in the word "sessionize".
Got it; thanks @ogrisel! I'll probably knock a few more of these out in the next couple of months. Let me know if you have any more ideas—I can't guarantee they'll get in before the next release but I hope to have most (if not all) of these in before then.
I hope to keep this issue as a rolling list of requested How To guides, not sure if it'll ever be closed.
For reference the 3 howto ideas above stem for my first-time experience with ibis and duckdb which is quite pleasant so far:
https://github.com/soda-inria/survival-analysis-benchmark/tree/main/datasets/kkbox_churn
Below is a list of some resources that I think are good examples to better learn tools in the data ecosystem. Maybe some parts of these could be translated to use Ibis. Sometimes one of the hardest parts is to just figure out a good narrative flow or interesting example use case when writing tutorials.
Many of these examples have other elements outside of just data manipulation like visualization or maybe some stats/machine learning. It could get a little bit tricky to choose which other libraries to feature, but I do think that it is useful to see how different tools interact.
Another good related post on how to sessionize data (using polars):
The dataset is good (not too big, not too small and realistic) and the multistep task is a good opportunity to show how to factorize code with functions chained via the .pipe
method.
I'll throw in data import/export. I find guidance on this topic currently pretty limited in the docs, but this is a step that every single user is going to do. Here is a nice example from vaex. There are other good examples of How-tos there as well.
closing this per recent docs updates, how-tos available on the site: https://ibis-project.org/how-to/configure/basics
Create+Populate How-To Guides
There is a lack of how to guides on the website. Let's make a list of them and start populating a useful section for code schematics.
I hope to keep this issue up as a rolling list of How to Guides, so if you stumble on this and your need isn't included in the docs or this list, then comment your request here!
v3.2
v4.0
_
API) (#4914)unpack
)v4.1+
asof_join
(source)Let me know if you have any ideas and I'll add them to the list.