nsbuitrago / parasail-rs

SIMD pairwise sequence alignment
https://crates.io/crates/parasail-rs
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Segfault when running parasail in multiple threads #7

Closed sclamons closed 3 weeks ago

sclamons commented 1 month ago

This could be a bit annoying to reproduce.

I've been trying to build an extension for Polars that applies a parasail alignment to each element of a column. This consistently leads to a segfault, unless I limit Polars to one thread (using the POLARS_MAX_THREADS env variable). With unlimited threads, the extension segfaults very consistently; with two threads, it segfaults eventually but takes more tries.

I'm not certain where the segfault is happening, but I think this is the relevant stack trace, as reported by pystack:

image

My extension code (apologies for the very-much-not-minimal example!):

expressions.rs:

use polars::prelude::*;
use pyo3_polars::derive::polars_expr;
use parasail_rs::{Matrix, Aligner, Profile};
use serde::Deserialize;

fn build_aligner(query: &str) -> Aligner {
    let profile = Profile::new(query.as_bytes(), false, &Matrix::create(b"ACTG", 1, 0).unwrap()).unwrap();
    let aligner = Aligner::new()
        .profile(profile)
        .striped()
        .local()
        .build();
    aligner
}

fn check_alignment_score(seq: &str, aligner: &Aligner) -> u8 {
    let result = aligner.align(None, seq.as_bytes()).ok();
    match result {
        Some(r) => r.get_score() as u8,
        None => 0
    }
}

#[derive(Deserialize)]
pub struct AlignKwargs {
    target: String
}

#[polars_expr(output_type=UInt8)]
fn check_for_const_seq_match(inputs: &[Series], kwargs: AlignKwargs) -> PolarsResult<Series> {
    println!("Starting check_for_const_seq_match");
    let aligner = build_aligner(&kwargs.target);
    let seqs = inputs[0].str()?;
    let match_scores: ChunkedArray<UInt8Type> = seqs.apply_generic(|data:Option<&str>| -> Option<u8> {
        data.map(|s| check_alignment_score(s, &aligner))
    });
    println!("Finished check_for_const_seq_match: {:?}", match_scores);
    Ok(match_scores.into_series())
}

lib.rs

mod expressions;
use pyo3::types::PyModule;
use pyo3::{pymodule, Bound, PyResult, Python};

#[pymodule]
fn _internal(_py: Python, m: &Bound<PyModule>) -> PyResult<()> {
    m.add("__version__", env!("CARGO_PKG_VERSION"))?;
    Ok(())
}

init.py

from pathlib import Path
from typing import TYPE_CHECKING

import polars as pl
from polars.plugins import register_plugin_function
from polars._typing import IntoExpr

def check_for_const_seq_match(expr: IntoExpr, target: str) -> pl.Expr:
    """Check for alignment with parasail."""
    return register_plugin_function(
        plugin_path=Path(__file__).parent,
        function_name="check_for_const_seq_match",
        args=expr,
        kwargs={"target": target},
        is_elementwise=True,
    )

Cargo.toml

[package]
name = "polars_alignment"
version = "0.1.0"
edition = "2021"

[lib]
name = "polars_alignment"
crate-type = ["cdylib"]

[dependencies]
polars = { version = "0.41.3", features = ["dtype-u8"] }
pyo3 = { version = "*", features = ["extension-module", "abi3-py38"] }
pyo3-polars = { version = "0.15.0", features = ["derive"] }
serde = { version = "*", features = ["derive"] }
parasail-rs = "0.7.3"
libparasail-sys = "0.1.6"
target-features = "0.1.6"
memoffset = "0.9.1"

[features]
default = ["python"]
python = []

[profile.release-with-debug]
inherits = "release"
debug = true

pyproject.toml

[build-system]
requires = ["maturin>=1.0,<2.0"]
build-backend = "maturin"

[project]
name = "polars_alignment"
version = "0.1.0"
dependencies = [
    "polars-lts-cpu==1.3.0"
]
requires-python = ">=3.0"
readme = "README.md"

[tool.maturin]
module-name = "polars_alignment._internal"

And finally the script code to reproduce:

import polars as pl, polars_alignment as pla
df = pl.DataFrame({"Barcode": ["AACAAAACGAAATGTTGCTA", "GCCACTAAGAAACCCCGACC", "TTACCCATCTAATCAATTTG", "TCAGCCTCACTGCACCCTAA", "ATTACTAATAAGATGAAGGG"]})
df.with_columns(alignscore=pla.check_for_const_seq_match("Barcode", target_seq)) # <- Run this line multiple times, it eventually segfaults
nsbuitrago commented 1 month ago

I am having some trouble reproducing this. Basically, I am using your working example and then just running

df.with_columns(alignscore=pla.check_for_const_seq_match("Barcode", target_seq)) # <- Run this line multiple times, it eventually segfaults

in a loop to see if it fails. If you find any other information that might be useful just let me know. I see you are trying to do striped local alignment, so for now I am working out a potential alternative solution.

nsbuitrago commented 1 month ago

Ok, I am able to reproduce this finally. Working on a fix, but it might take some time.

sclamons commented 1 month ago

Thank you so much for putting the time into tracking down - and hopefully fixing, fingers crossed - this issue! It's truly heroic of you.

I'm not exactly sure where how Polars' internals work, but the aligner should get built either once for the whole column of data, or one for each chunk that Polars' decides to split it into.

On Sat, Aug 10, 2024, 10:17 AM Nicolas Buitrago @.***> wrote:

Ok, I am able to reproduce this finally. Working on a fix, but it might take some time.

— Reply to this email directly, view it on GitHub https://github.com/nsbuitrago/parasail-rs/issues/7#issuecomment-2282214827, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM7DU3S7DYOWQ7BVKRU3L3ZQZDMPAVCNFSM6AAAAABMBGEBTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBSGIYTIOBSG4 . You are receiving this because you authored the thread.Message ID: @.***>

nsbuitrago commented 1 month ago

I've double checked for any memory issues with parasail-rs but haven't been able to find anything specific. This problem may be related to a stack overflow in polars. A couple things that could help are setting RUST_MIN_STACK (as noted in https://github.com/pola-rs/polars/issues/12268). There may also be some additional optimization that can be done on the extension side. For example, with check_for_const_seq_match as is I get a segfault at around 100000 iterations when adjusting RUST_MIN_STACK. This issue is solved if we cache the aligner to avoid rebuilding it every time the target sequence changes or for each iteration in my example.

use once_cell::sync::Lazy;
use parking_lot::Mutex;

//... rest of implementation

struct CachedAligner {
    aligner: Aligner,
    target: String,
}

static ALIGNER_CACHE: Lazy<Mutex<Option<CachedAligner>>> = Lazy::new(|| Mutex::new(None));

#[polars_expr(output_type=UInt8)]
pub fn check_for_const_seq_match(inputs: &[Series], kwargs: AlignKwargs) -> PolarsResult<Series> {
    let aligner = {
        let mut cache = ALIGNER_CACHE.lock();
        match cache.as_ref() {
            Some(cached) if cached.target == kwargs.target => cached.aligner.clone(),
            _ => {
                let new_aligner: Aligner = build_aligner(&kwargs.target);
                *cache = Some(CachedAligner {
                    aligner: new_aligner.clone(),
                    target: kwargs.target.clone(),
                });
                new_aligner
            }
        }
    };

    let seqs = inputs[0].str()?;
    let match_scores: ChunkedArray<UInt8Type> =
        seqs.try_apply_generic(|data: Option<&str>| -> PolarsResult<Option<u8>> {
            match data {
                Some(s) => Ok(Some(check_alignment_score(s, &aligner))),
                None => Ok(None),
            }
        })?;

    println!("Finished check_for_const_seq_match: {:?}", match_scores);
    Ok(match_scores.into_series())
}

This leads me to think this is just an issue with the underlying polars extension. Of course this is just a working example I found to avoid the issue but may not be the best. The above example does rely on being able to clone the aligner which is not available as of 0.7.3. I've included for the next update in 0.7.4.