grenaud / schmutzi

Maximum a posteriori estimate of contamination for ancient samples
20 stars 2 forks source link

FASTA error: invalid refId specified #13

Open cegamorim opened 3 years ago

cegamorim commented 3 years ago

Hi Gabriel,

I'm trying to run schmutzi.pl and I am getting the following error:

running cmd endoCaller  -seq b07_npred_1_endo.fa   -log b07_npred_1_endo.log  -name MT  -qual 0  -logindel 50  -deamread  -deam5p b07_npred_1_endo.5p.prof  -deam3p b07_npred_1_endo.3p.prof  -cont 0.205  -deam5pc b07_npred_1_cont.5p.prof  -deam3pc b07_npred_1_cont.3p.prof  -single   -seqc b07_npred_1_cont.fa   -logc b07_npred_1_cont.log  -namec MTc  -l 16569  ./refGenomes/MT.fa ../b07.mtDNA.sorted.calMD.bam 
Reading genome file ...
... done
Reading BAM file ...
FASTA error: invalid refId specified: 24
bamtools convert ERROR: pileup conversion - could not read reference base from FASTA file at chr 24 position 0
This can be normally solved by calling samtools' calmd on your bam file

I have already used samtools calmd. It looks like bamtools is referring to the mtDNA as "24" but in my BAM file it is "MT" and in the fasta file it is ">MT"; also the endoCaller command specifies " -name MT."

Do you know where is this refId being specified as 24? Could you please help me to fix this issue?

Thank you,

Eduardo

grenaud commented 3 years ago

Hello! Just to be sure, did you realign everything to the MT or did you just isolate the mt reads from a bigger BAM file aligned against the genome?

On Mon, Feb 15, 2021 at 4:09 PM cegamorim notifications@github.com wrote:

Hi Gabriel,

I'm trying to run schmutzi.pl and I am getting the following error:

running cmd endoCaller -seq b07_npred_1_endo.fa -log b07_npred_1_endo.log -name MT -qual 0 -logindel 50 -deamread -deam5p b07_npred_1_endo.5p.prof -deam3p b07_npred_1_endo.3p.prof -cont 0.205 -deam5pc b07_npred_1_cont.5p.prof -deam3pc b07_npred_1_cont.3p.prof -single -seqc b07_npred_1_cont.fa -logc b07_npred_1_cont.log -namec MTc -l 16569 ./refGenomes/MT.fa ../b07.mtDNA.sorted.calMD.bam Reading genome file ... ... done Reading BAM file ... FASTA error: invalid refId specified: 24 bamtools convert ERROR: pileup conversion - could not read reference base from FASTA file at chr 24 position 0 This can be normally solved by calling samtools' calmd on your bam file

I have already used samtools calmd. It looks like bamtools is referring to the mtDNA as "24" but in my BAM file it is "MT" and in the fasta file it is ">MT"; also the endoCaller command specifies " -name MT."

Do you know where is this refId being specified as 24? Could you please help me to fix this issue?

Thank you,

Eduardo

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

cegamorim commented 3 years ago

Thanks for your prompt reply! I subset the mt reads from a bigger BAm file aligned against the whole genome hg19.

grenaud commented 3 years ago

ah! that is the issue, there are 2 ways forward: 1) manually hack the header to eliminate all the other references and reconvert to bam. I do not recommend this. 2) realign everything (from your original bam file) to the MT reference. As the mt is relatively small, this is very fast. This also takes care of numts which cause dips in coverage.

On Mon, Feb 15, 2021 at 5:26 PM cegamorim notifications@github.com wrote:

Thanks for your prompt reply! I subset the mt reads from a bigger BAm file aligned against the whole genome hg19.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/schmutzi/issues/13#issuecomment-779329027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNI24UK5I7QRCZ434J3DS7FDK5ANCNFSM4XU27MOQ .

cegamorim commented 3 years ago

Awesome! Thank you so much!