lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
476 stars 131 forks source link

bamrenamechr [SEVERE][ConvertBamChromosomes]Illegal state mapper==null or dict-out=null #223

Closed sagitaninta closed 1 year ago

sagitaninta commented 1 year ago

Cannot make bamrenamechr work

I just want to rename the chromosomes in my BAM file from only number to having "chr" prefix, but it does not work. Possibly just because I do not know how to run it based from the documentation. I do not know where to get that toy.sam as I have my samtools from conda environment.

Basically, this is what I run with my BAM file, and what I got:

java -jar ~/bin/jvarkit/dist/bamrenamechr.jar -f <(echo -e "1\tchr1\n2\tchr2") SRR7107779_mdup.bam | head
@HD VN:1.6  SO:coordinate
@RG ID:SRR7107779_1 length=101  PL:ILLUMINA SM:SRR7107779   LB:SRR7107779
@PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:/dss/dsshome1/lxc06/ra57gih/.conda/envs/bioinfo_env/bin/bwa mem -t 28 -R @RG\tID:SRR7107779_1 length=101\tPL:ILLUMINA\tSM:SRR7107779\tLB:SRR7107779 /gpfs/scratch/pn29bi/ra57gih/CANIDref/Canis_familiaris.CanFam3.1.dna.toplevel_PREFIX.fa SRR7107779_1.fastq SRR7107779_2.fastq
@PG ID:samtools PN:samtools PP:bwa  VN:1.14 CL:samtools view -Shu -
@PG ID:samtools.1   PN:samtools PP:samtools VN:1.14 CL:samtools sort -m 2G -@ 4 -o SRR7107779.bam -
@PG ID:MarkDuplicates   VN:4.2.6.1  CL:MarkDuplicates MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 INPUT=[SRR7107779.bam] OUTPUT=SRR7107779_mdup.bam METRICS_FILE=SRR7107779_mdupMetrics.txt VALIDATION_STRINGENCY=LENIENT    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=2 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false   PN:MarkDuplicates
[SEVERE][ConvertBamChromosomes]Cannot find contig "1" in dictionary:[]
com.github.lindenb.jvarkit.lang.JvarkitException$ContigNotFoundInDictionary: Cannot find contig "1" in dictionary:[]
    at com.github.lindenb.jvarkit.tools.misc.ConvertBamChromosomes.convert(ConvertBamChromosomes.java:164)
    at com.github.lindenb.jvarkit.tools.misc.ConvertBamChromosomes.doWork(ConvertBamChromosomes.java:348)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:796)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:959)
    at com.github.lindenb.jvarkit.tools.misc.ConvertBamChromosomes.main(ConvertBamChromosomes.java:425)
[INFO][Launcher]bamrenamechr Exited with failure (-1)

Then, I tried to make a dictionary, both with or without chromosome prefix, but got that message.

java -jar ~/bin/jvarkit/dist/bamrenamechr.jar    -f <(echo -e "1\tchr1\n2\tchr2") SRR7107779_mdup.bam --dict ../../CANIDref/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.dict | head
[WARN][ConvertBamChromosomes]Output dictionary doesn't contain chr1, skipping
[WARN][ConvertBamChromosomes]Output dictionary doesn't contain chr2, skipping
[SEVERE][ConvertBamChromosomes]Illegal state mapper==null or dict-out=null
[INFO][Launcher]bamrenamechr Exited with failure (-1)

java -jar ~/bin/jvarkit/dist/bamrenamechr.jar    -f <(echo -e "1\tchr1\n2\tchr2") SRR7107779_mdup.bam --dict ../../CANIDref/Canis_lupus_familiaris.CanFam3.1.dna.toplevel_PREFIX.dict | head
[SEVERE][ConvertBamChromosomes]Illegal state mapper==null or dict-out=null
[INFO][Launcher]bamrenamechr Exited with failure (-1)

Your environment

lindenb commented 1 year ago

Hi, is there a dictionary IN the BAM header (lines starting with @SQ ) ?

lindenb commented 1 year ago

also, please send a minimal bam file with the header and the first read.

sagitaninta commented 1 year ago

To answer the first question, yes, there are dictionary in the BAM header. They are these ones right?

samtools view -H SRR7107779_mdup_test.bam | grep "@SQ" | head
@SQ SN:1    LN:122678785
@SQ SN:2    LN:85426708
@SQ SN:3    LN:91889043
@SQ SN:4    LN:88276631
@SQ SN:5    LN:88915250
@SQ SN:6    LN:77573801
@SQ SN:7    LN:80974532
@SQ SN:8    LN:74330416
@SQ SN:9    LN:61074082
@SQ SN:10   LN:69331447

Attached the minimum BAM as txt; you can rename the extension as .bam and ran it with bamrenamechr.jar and it will behave the same, at least in my side. I hope that works as GitHub do not like .bam extension. SRR7107779_mdup_test.txt

lindenb commented 1 year ago

works on my machine.

cat SRR*.txt | samtools view -O BAM -o jeter.bam
samtools index jeter.bam
samtools idxstats jeter.bam | grep -v -F '*' | awk '{printf("%s\tchr%s\n",$1,$1);}' >  jeter2.txt

$ java -jar dist/bamrenamechr.jar -f jeter2.txt jeter.bam | grep chr | tail -1 && echo OK

[WARN][ConvertBamChromosomes]num ignored read:0
SRR7107779.93158088 147 chr38   14665040    60  18S71M  =   14664718    -393    TTATAGTTTCCTGGTCCCCCTTCTGGCCACCCCTTCCTCCACAGCATGTCCTTGCCATCCCTCGAGCCGTCATCTTGAAGGACACAGTG   ###############################98::995:9:8:899:8:98:88999999992899;39;;;;;;;:;;::9;9;::<;   MC:Z:21M1I79M   MD:Z:12A58  PG:Z:MarkDuplicates RG:Z:SRR7107779_1 length=101    NM:i:1  AS:i:66 XS:i:19

OK
sagitaninta commented 1 year ago

That line with samtools idxstats is the one I need to make it work! Thanks. I foolishly just do the first two chromosomes following exactly the example in the docs 🤪

I am closing this, thank you very much for your time.