mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Error: malformatted header: no ':' in field #172

Closed fanghe0720 closed 4 months ago

fanghe0720 commented 5 months ago

Hi,

I'm using your TEcount for with the .bam file produced by STAR. It is unsorted. Below is the details. Is there any specific requirement for STAR mapping to use your TEtranscripts?

TEcount --format BAM --mode multi -b Aligned.out.bam  -i 100 --project NAME --GTF ./Mus_musculus.GRCm38.100.chr.gtf --TE ./GRCm38_Ensembl_rmsk_TE.gtf

# multi-mapper mode = multi
# stranded = no
# number of iteration = 100
# Alignments grouped by read ID = True

INFO  @ Sat, 06 Jan 2024 15:44:16: Processing GTF files ...

INFO  @ Sat, 06 Jan 2024 15:44:16: Building gene index .......

100000 GTF lines processed.
200000 GTF lines processed.
300000 GTF lines processed.
400000 GTF lines processed.
500000 GTF lines processed.
600000 GTF lines processed.
700000 GTF lines processed.
800000 GTF lines processed.
INFO  @ Sat, 06 Jan 2024 15:52:07: Done building gene index ......

INFO  @ Sat, 06 Jan 2024 15:53:46: Building TE index .......

INFO  @ Sat, 06 Jan 2024 15:59:13: Done building TE index ......

INFO  @ Sat, 06 Jan 2024 15:59:13:
Reading sample file ...

Error occurred when reading first line of sample file Aligned.out.
Error: malformatted header: no ':' in field
[Exception type: ValueError, raised in libcalignmentfile.pyx:446]
olivertam commented 5 months ago

Hi,

It looks like the error is coming from pysam.

Could you post the output of the following? $ samtools quickcheck Aligned.out.bam $ samtools view -H Aligned.out.bam $ samtools view Aigned.out.bam | head -n 10

Thanks

fanghe0720 commented 5 months ago

Thanks for the quick response. Please see below. 1, nothing showed up with $samtools quickcheck Aligned.out.bam 2, samtools view -H Aligned.out.bam [E::sam_hrecs_error] Malformed key:value pair at line 1: "@HD VN:1.4 –sjdbGTF file ./Mus_musculus.GRCm38.100.chr.gtf" samtools view: failed to add PG line to the header 3, [hefang@n028 AFD_input]$ samtools view Aligned.out.bam | head -n 10 [E::sam_hrecs_error] Malformed key:value pair at line 1: "@HD VN:1.4 –sjdbGTF file ./Mus_musculus.GRCm38.100.chr.gtf" samtools view: failed to add PG line to the header

I can see the header without any problem with the .bam files produced by STAR previously. The only parameters I changed for this batch of mapping are --sjdbOverhang 149 --winAnchorMultimapNmax 200 --outFilterMultimapNmax 100 to keep repeats.

olivertam commented 5 months ago

Hi,

Thanks for your feedback. It really looks like there's an issue with the header, as the @HD line should not contain the --sjdbGTFfile line (which should be in the @PG line instead). I wonder if there was an issue with STAR writing the file header (potentially just for those files?). Could you try re-running the STAR command and see if it replicates (or if it was a one-time thing)? I can't see any issue with the extra parameters that you provided.

Thanks.

fanghe0720 commented 5 months ago

Thank you for the suggestion! I will re-run and see. Thanks

fanghe0720 commented 5 months ago

Re-run solved this problem. Thanks!

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days