Open wu116 opened 1 year ago
I'm not familiar with PacBIO's toolset, so I may be wrong...
From looking at their documentation for their SAM/BAM files, their tools expect CIGAR to use =
and X
instead of the usual M
and will quit with an error when the M
is encountered. If this is the cause, there should be a simple solution.
sam-dump
has an option to use =
and X
, -c | --cigar-long
.
Thank for relying!
But It may not be the cause in my case. Here are the first line in the sam file that the sam-dump
generate whether with -c
or not.
1 4 * 0 0 * * 0 0 AGTTGTGGGAAGGAAGTTTTGATTGGTGAGGATGTGTTTGGTTTTGATTTTAATGATGTTATTAATTGATTTGTGAGTGTTTGATTAAGTAAGTTAAGTATAGTTGGTTGATGGAGTTGTTTGGGTTGAGATTTATAAAGAGTGAGTGGTGTAGCGATTGGGTAAAGAGGAGAAGATTTCGATTGTGTGGTTTTACAAGAGAACAATAACATGGAGTAGGATGTGCATATTAGTGCGAGTGGTAG !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! RG:Z:1c1c6cfd
There maybe some improtant information in the header of PacBio bam file, but the header cannot be retained though I add the -r|--header
.
I check the sra file and find there is the PacBio bam header there.
$BAM_HEADER@HD VN:1.5 SO:unknown pb:3.0.7
@RG ID:1c1c6cfd PL:PACBIO DS:READTYPE=SUBREAD;Ipd:CodecV1=ip;PulseWidth:CodecV1=pw;BINDINGKIT=101-789-500;SEQUENCINGKIT=101-826-100
;BASECALLERVERSION=5.0.0;FRAMERATEHZ=100.000000;BarcodeFile=/share02/bioCloud/compute/cloudpub/u4359N4/dataProcess_RUN293_D07_20210111180029/spli
t_xml_1610359240111/outputs/barcode.fa;BarcodeHash=2ad43f747b13dbca24d0688e8dff8ab2;BarcodeCount=11;BarcodeMode=Symmetric;BarcodeQuality=Score
LB:AF031-AD159-AC883 -ISO PU:m64087_210109_061940 SM:ISO PM:SEQUELII
@PG ID:baz2bam PN:baz2bam VN:9.0.0.92233 CL:/opt/pacbio/ppa-9.0.0/bin/baz2bam /data/pa/m64087_210109_061940.baz -o /data/pa/m64087
_210109_061940 --metadata /data/pa/.m64087_210109_061940.metadata.xml -j 32 -b 8 --inlinePbi --progress --silent --maxInputQueueMB 70000 --zmwBat
chMB 50000 --zmwHeaderBatchMB 30000
@PG ID:bazFormat PN:bazformat VN:1.6.0
@PG ID:bazwriter PN:bazwriter VN:9.0.0
@PG ID:lima VN:1.9.0 (commit 7727b1f)
Are there any method for sam-dump
to keep this header?
What is the accession you are working with?
SRR16979014 in the project PRJNA774118.
I find the point that the PacBio bam file has some additional columns after the original columns in the BODY, the header I showed may be irrelevance. Losing those columns may be the actual cause why the isoseq3 give error.
The best way to solve this problem may be attaching the original bam files submitted to SRA by uploaders through AWS of GCP.
But I still wonder if there will be possible to change the sra file into the PacBio bam file in the future after some updating of sra-tools or not?
Thanks for your kindly help again. : )
Experiencing a similar problem - would love to work with some official pb tools which require pb bam headers, but these are not preserved when using sam-dump
to write SAM (and converting to BAM with samtools view
) from .sra files. Is the original header information kept in the .sra?
I think the header I gave above might just be the header for the entire bam file, the header for each read have not been kept in the sra file. So I gave up and tried to access the raw data in bam file.
Thank for relying!
But It may not be the cause in my case. Here are the first line in the sam file that the
sam-dump
generate whether with-c
or not.1 4 * 0 0 * * 0 0 AGTTGTGGGAAGGAAGTTTTGATTGGTGAGGATGTGTTTGGTTTTGATTTTAATGATGTTATTAATTGATTTGTGAGTGTTTGATTAAGTAAGTTAAGTATAGTTGGTTGATGGAGTTGTTTGGGTTGAGATTTATAAAGAGTGAGTGGTGTAGCGATTGGGTAAAGAGGAGAAGATTTCGATTGTGTGGTTTTACAAGAGAACAATAACATGGAGTAGGATGTGCATATTAGTGCGAGTGGTAG !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! RG:Z:1c1c6cfd
There maybe some improtant information in the header of PacBio bam file, but the header cannot be retained though I add the
-r|--header
.
I meet the same question. I downloaded the sra files from the database and convert it to sam or bam files using the sam-dump,but the result files can not be used for the downstream analysis.
Dear Developers, hello! There are some projects using PacBio single-molecule long-read sequencing to analyze full-length transcriptome, but the raw data is in a bam type file and has to be changed into sra type file for uploading to SRA database. I want to analyze using the official software isoseq3 which need a special PacBio bam file, but the sam-dump cannot change the sra file into the special PacBio bam file correctly. It seens that some information lose when uploader change the special PacBio bam file into sra file. Could you please give some adivce?