wheretrue / biobear

Work with bioinformatic files using Arrow, Polars, and/or DuckDB
https://www.wheretrue.dev/docs/exon/biobear/
MIT License
163 stars 9 forks source link

read bam error #171

Closed Liripo closed 2 months ago

Liripo commented 3 months ago

error :

ArrowInvalid: C Data interface error: External error: Arrow error: External error: Io error: invalid data"

code:

import biobear as bb

session = bb.connect()

session.sql("""
    CREATE EXTERNAL TABLE experiment STORED AS BAM LOCATION 'Aligned.sortedByCoord.out.bam'
""")

result = session.sql("""
    SELECT start FROM experiment
""")

result.to_polars()

Aligned.sortedByCoord.out.bam is generated by STAR

Liripo commented 3 months ago

? @PiaoR

tshauck commented 3 months ago

@Liripo Thanks for posting this, I'll have a look and follow up.

If you have a snippet of the file that reproduces the error that'd be helpful.

Liripo commented 2 months ago

https://drive.google.com/file/d/10zXT2zc1iQJ9y5jY_CeInGKF2LD_o51N/view?usp=sharing This is an example of a 6M bam file.

tshauck commented 2 months ago

Thanks! -- it looks like this has duplicate CR tags, is that expected with STAR? (I'm not that famailiar with it as a tool).

According to the BAM file standard, there shouldn't be duplicate tags:

image https://samtools.github.io/hts-specs/SAMv1.pdf

I'll try to make that error bubble up in the mean time.

Edit: here's what the error message will look like in the next release in this case...

> CREATE EXTERNAL TABLE bam STORED AS BAM LOCATION 'exon/exon-core/test-data/Aligned.sortedByCoord.out.bam';
0 row(s) fetched. 
Elapsed 0.017 seconds.

> SELECT * FROM bam LIMIT 1;
Arrow error: External error: Error: Custom { kind: InvalidData, error: InvalidData(DuplicateTag(Tag("CR"))) }

A little more verbose, but clearer as well. https://github.com/wheretrue/exon/pull/615

Liripo commented 2 months ago

Thanks for your help. I typed an extra CR tag in the STAR command line option. A clearer error message would be better.

STAR --outSAMattributes CR CR
tshauck commented 2 months ago

Great, I'll go ahead and close this then. Please feel free to open more issues if you have issues and/or questions.