Closed rts-gordon closed 3 years ago
Forgive me if I am wrong, but isn't OHLC not an aggregation of first,
max,
min, and
last`?
If so, you can do:
fn example(df: &DataFrame) -> Result<DataFrame> {
df.downsample("datetime", SampleRule::Minute(6))?
.agg(&[("foo_ohlm", &["first", "max", "min", "last"])])?
.sort("datetime", false)
}
Or in python
def example(df):
return df.downsample("a", rule="minute", n=5).agg({"b": ["first", "min", "max", "last"]})
@ritchie46 Thanks for quick reply. Yes, OHLC means first/max/min/last, But I wondering if SampleRule in Polars support all time periods like 1min/5min/15min/30min/1hour/2hour/4hour/1week/1month? For example, process a tick at 2021-03-03 10:43:15, so the time periods would be: 1min: 2021-03-03 10:43:00; 5min: 2021-03-03 10:40:00; 15min: 2021-03-03 10:30:00; 30min: 2021-03-03 10:30:00; 1hour: 2021-03-03 10:00:00; 2hour: 2021-03-03 10:00:00; 4hour: 2021-03-03 08:00:00; 1week: 2021-03-01 00:00:00; 1month: 2021-03-01 00:00:00;
Regards CHCP
All the time periods you mention can be composed by the SampleRule. So a week would be SampleRule::Day(7)
.
Thanks @ritchie46 , I will test for SampleRule.
Hi @ritchie46 ,
I have a csv file like this:
AUDCAD,20201001 23:58:49.724418,0.9545,0.95476,1
AUDCAD,20201001 23:58:49.780350,0.9545,0.95476,1
AUDCAD,20201001 23:58:49.826159,0.9545,0.95476,1
AUDCAD,20201001 23:58:49.860344,0.95449,0.95476,1
AUDCAD,20201001 23:58:50.163641,0.95449,0.9547,1
AUDCAD,20201001 23:58:50.186391,0.95447,0.95469,10
AUDCAD,20201001 23:58:50.238856,0.95449,0.95472,1
When I use CsvReader to read this file, how to define the string datetime in the schema filed? I use Time64/Date64, but it doesn't work. thanks a lot.
fn get_schema() -> Schema {
Schema::new(vec![
Field::new("s", DataType::Utf8),
//Field::new("u", DataType::Utf8),
//Field::new("u", DataType::Time64(TimeUnit::Millisecond)),
Field::new("u", DataType::Date64),
Field::new("c", DataType::Float64),
Field::new("a", DataType::Float64),
Field::new("v", DataType::UInt64),
])
}
pub async fn example() -> PolarResult<DataFrame> {
let schema = get_schema();
let df = CsvReader::from_path("./data/20201001.csv")?
.with_schema(Arc::new(schema))
.has_header(false)
.finish()?;
debug!("df ==== {:?}", df);
let res = df.downsample("datetime", SampleRule::Minute(1))?
.agg(&[("c", &["first", "max", "min", "last"])])?
.sort("datetime", false);
debug!("res === {:?}", res);
res
}
there are some errors:
thread 'thread 'thread '<unnamed>thread 'thread 'thread 'thread '<unnamed><unnamed><unnamed><unnamed>' panicked at 'thread '<unnamed>' panicked at '<unnamed>' panicked at '' panicked at 'called `Result::unwrap()` on an `Err`
value: Other("Unsupported data type Date64 when reading a csv")called `Result::unwrap()` on an `Err` value: Other("Unsupported data type Date64 when reading a csv")<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Other("Unsupported data type Date64 when reading a csv")called `Result::unwrap()` on an `Err` value: Other("Unsupported data type Date64 when reading a csv")', called `Result::unwrap()` on an `Err` value: Other("Unsupported data type Date64 when reading a csv")', ', ' panicked at '' panicked at '' panicked at '', C:\Users\Gordon\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-io-0.12.1\src\csv_core\csv.rs', :C:\Users\Gordon\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-io-0.12.1\src\csv_core\csv.rscalled `Result::unwrap()` on an `Err` value: Other("Unsupported data type Date64 when reading a csv")called `Result::unwrap()` on an `Err` value: Other("Unsupported data type Date64 when reading a csv")called `Result::unwrap()` on an `Err` value: Other("Unsupported data type Date64 when reading a csv")C:\Users\Gordon\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-io-0.12.1\src\csv_core\csv.rsC:\Users\Gordon\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-io-0.12.1\src\csv_core\csv.rsC:\Users\Gordon\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-io-0.12.1\src\csv_core\csv.rs168:', ', ', ::::168C:\Users\Gordon\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-io-0.12.1\src\csv_core\csv.rsC:\Users\Gordon\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-io-0.12.1\src\csv_core\csv.rsC:\Users\Gordon\.cargo\registry\src\github.com-1ecc6299db9ec823\polars-io-0.12.1\src\csv_core\csv.rs16816816890::90::::
Yes, at the moment you first have to parse the Date64
fields as Utf8
type. Later you can cast them to Date64
, with your required fmt
Hi @ritchie46 I try to use Utf8 for the column 'u', the string datetime, but the following code doesn't work, can you please give me an example, thank you.
let res = df.downsample("datetime", SampleRule::Minute(1))?
.agg(&[("c", &["first", "max", "min", "last"])])?
.sort("datetime", false);
Hi, I would like to, but I really do not understand how to parse your date column? :confused:
What would the date of this 20201001 23:58:49.724418
be?
I have assumed a parsing fmt
for convenience.
This is a OHLC downsample to seconds. I've used seconds here because it is a bit more interesting result.
use polars::frame::resample::SampleRule;
use polars::prelude::*;
use std::io::Cursor;
fn get_schema() -> Schema {
Schema::new(vec![
Field::new("s", DataType::Utf8),
Field::new("u", DataType::Utf8),
Field::new("c", DataType::Float64),
Field::new("a", DataType::Float64),
Field::new("v", DataType::UInt64),
])
}
fn run() -> Result<DataFrame> {
let data = r#"AUDCAD,20201001 23:58:49.724418,0.9545,0.95476,1
AUDCAD,20201001 23:58:49.780350,0.9545,0.95476,1
AUDCAD,20201001 23:58:49.826159,0.9545,0.95476,1
AUDCAD,20201001 23:58:49.860344,0.95449,0.95476,1
AUDCAD,20201001 23:58:50.163641,0.95449,0.9547,1
AUDCAD,20201001 23:58:50.186391,0.95447,0.95469,10
AUDCAD,20201001 23:58:50.238856,0.95449,0.95472,1
"#;
let file = Cursor::new(data);
let schema = get_schema();
let mut df = CsvReader::new(file)
.with_schema(Arc::new(schema))
.has_header(false)
.finish()?;
// cast column 'u' from utf8 to date64
// parse fmt for datetime
let cast_fmt = Some("%Y%m%d %H:%M:%S%.6f");
df.may_apply("u", |s| s.utf8()?.as_date64(cast_fmt))?;
dbg!(&df);
let res = df
.downsample("u", SampleRule::Second(1))?
.agg(&[("c", &["first", "max", "min", "last"])])?
.sort("u", false)?;
dbg!(&res);
Ok(res)
}
pub fn main() {
run().expect("failed");
}
This outputs:
[src/main.rs:38] &df = shape: (7, 5)
╭──────────┬─────────────────────────┬───────┬───────┬─────╮
│ s ┆ u ┆ c ┆ a ┆ v │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ date64(ms) ┆ f64 ┆ f64 ┆ u64 │
╞══════════╪═════════════════════════╪═══════╪═══════╪═════╡
│ "AUDCAD" ┆ 2020-10-01 23:58:49.724 ┆ 0.955 ┆ 0.955 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "AUDCAD" ┆ 2020-10-01 23:58:49.780 ┆ 0.955 ┆ 0.955 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "AUDCAD" ┆ 2020-10-01 23:58:49.826 ┆ 0.955 ┆ 0.955 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "AUDCAD" ┆ 2020-10-01 23:58:49.860 ┆ 0.954 ┆ 0.955 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "AUDCAD" ┆ 2020-10-01 23:58:50.163 ┆ 0.954 ┆ 0.955 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "AUDCAD" ┆ 2020-10-01 23:58:50.186 ┆ 0.954 ┆ 0.955 ┆ 10 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "AUDCAD" ┆ 2020-10-01 23:58:50.238 ┆ 0.954 ┆ 0.955 ┆ 1 │
╰──────────┴─────────────────────────┴───────┴───────┴─────╯
[src/main.rs:45] &res = shape: (2, 5)
╭─────────────────────┬─────────┬───────┬───────┬────────╮
│ u ┆ c_first ┆ c_max ┆ c_min ┆ c_last │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ date64(ms) ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════════════════╪═════════╪═══════╪═══════╪════════╡
│ 2020-10-01 23:58:49 ┆ 0.955 ┆ 0.955 ┆ 0.954 ┆ 0.954 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2020-10-01 23:58:50 ┆ 0.954 ┆ 0.954 ┆ 0.954 ┆ 0.954 │
╰─────────────────────┴─────────┴───────┴───────┴────────╯
It is works. Many many thanks to you, @ritchie46 I will study your code carefully. "Polars" is a very powerful project, need more time to learn it.
Great. If you've got any more questions let me know.
Hi there, Our team use python Pandas to calculate OHLC in a Trade system for a long time, now we want to use RUST to rebuild the system for higher performance, so we found Polars. But there is no OHLC functions in Polars, and would you like to implement OHLC in polars like this ohlc in Pandas
Thank you very much.