pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.1k stars 1.83k forks source link

Big integer error #17705

Open Smotrov opened 1 month ago

Smotrov commented 1 month ago

Checks

Reproducible example


use polars::prelude::*;
use std::path::Path;
use tokio::fs::File;
use tokio::io::{AsyncReadExt, BufReader};
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Init the tracing subscriber
    tracing_subscriber::fmt::Subscriber::builder()
        .with_max_level(tracing::Level::DEBUG)
        .init();

    // Path to the JSON file
    let file_path = Path::new("./data/pairs.json");

    tracing::info!("Reading the file");

    let file = File::open(file_path).await?;
    let mut reader = BufReader::new(file);

    let mut json_data = String::new();

    reader.read_to_string(&mut json_data).await?;

    tracing::info!("Before error: {:?}", &json_data[63740000..63740037]);
    tracing::info!("After error: {:?}", &json_data[63740037..63740100]);

    tracing::info!("Whole line: {:?}", &json_data[63740000..63740100]);

    let cursor = std::io::Cursor::new(json_data);

    let mut schema = Schema::new();
    schema.with_column("price".into(), DataType::UInt64);

    // When using the schema:
    let _df = JsonReader::new(cursor)
        .with_json_format(JsonFormat::Json)
        .with_schema(Arc::new(schema))
        .finish()?;
    Ok(())
}

Log output

2024-07-18T17:12:48.534981Z  INFO pool_analytics: Reading the file
2024-07-18T17:12:48.628441Z  INFO pool_analytics: Before error: "dQuote\":0,\"price\":4944150486926564000"
2024-07-18T17:12:48.628487Z  INFO pool_analytics: After error: "0,\"lpPrice\":0.000006596342331390962,\"tokenAmountCoin\":2.1e-8,\"t"
2024-07-18T17:12:48.628489Z  INFO pool_analytics: Whole line: "dQuote\":0,\"price\":49441504869265640000,\"lpPrice\":0.000006596342331390962,\"tokenAmountCoin\":2.1e-8,\"t"
Error: ComputeError(ErrString("InvalidNumber at character 63740037 ('0')"))

Issue description

It seams like it cannot parse huge integers. Meanwhile serde works well with the same file.

Expected behavior

Would be good if it will properly handle UInt64

Installed versions

Replace this line with a list of feature gates
Smotrov commented 1 month ago

Works well when adding

simd-json = { version = "0.13.10", features = ["big-int-as-float"] }

Would be good to re-expose the feature.