icebert / eccDNA_RCA_nanopore

eccDNA identification from nanopore long reads of rolling-circle amplicon
MIT License
5 stars 4 forks source link

KeyError {0} not in {1} #8

Closed alexbacita closed 6 months ago

alexbacita commented 6 months ago

Is this basically saying that it cannot deal with unmapped reads? How can I fix this? Thanks!

Traceback (most recent call last): File "/mnt/c/linuxdata/eccDNA_RCA_nanopore/./eccDNA_RCA_nanopore.py", line 577, in cs = ConsensusSequence(fastq, genome, read).callConsensus() File "/mnt/c/linuxdata/eccDNA_RCA_nanopore/./eccDNA_RCA_nanopore.py", line 512, in callConsensus reference = self._getReference() File "/mnt/c/linuxdata/eccDNA_RCA_nanopore/./eccDNA_RCA_nanopore.py", line 507, in _getReference s = self.genome[interval.chr][interval.start:interval.end+1] File "/home/abacita/miniconda3/lib/python3.10/site-packages/pyfaidx/init.py", line 1124, in getitem raise KeyError("{0} not in {1}.".format(rname, self.filename)) KeyError: '0bac1663-45c1-420a-b923-f7cbb7736d05 not in hg38.fa.'

icebert commented 6 months ago

Hi Alexandru,

Yes, unmapped reads should not be included. If you use minimap2 for the mapping, by default, minimap2 does not output unmapped reads in paf format. If you use other mappers, you need to remove the unmapped reads first.

alexbacita commented 6 months ago

Thanks! I've used: minimap2 -x map-ont -c --secondary=no -t 18 hg38_index.mmi merged_pass.fastq.gz > alignment.paf

However, my merged_pass.fastq.gz file is merged from two different sequencing runs of the same sample. Wondering if that has anything to do with it. I'll try some troubleshooting but please let me know if you spot anything wrong in the above command.

icebert commented 6 months ago

The minimap2 command is correct. I guess it's because of running eccDNA_RCA_nanopore.py --fastq should be the reads fastq file instead of the hg38.fa

alexbacita commented 6 months ago

eccDNA_RCA_nanopore.py command is the following: ./eccDNA_RCA_nanopore.py --fastq merged_pass.fastq --paf alignment.paf --info info.tsv --seq seq.fa --var var.tsv --reference hg38.fa

Tried it on test files/other datasets and worked fine. Problem seems to be input data I guess - thanks for the quick responses, much appreciated!

alexbacita commented 6 months ago

Full error below - will come back if/when fixed but any other ideas please let me know

Traceback (most recent call last): File "/home/abacita/miniconda3/lib/python3.10/site-packages/pyfaidx/init.py", line 1122, in getitem return self.records[rname] KeyError: '0bac1663-45c1-420a-b923-f7cbb7736d05'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/mnt/c/data/W3_4h_T7/W3_4h_T7/20240412_1515_MN39669_FAX91881_924419b0/./eccDNA_RCA_nanopore.py", line 577, in cs = ConsensusSequence(fastq, genome, read).callConsensus() File "/mnt/c/data/W3_4h_T7/W3_4h_T7/20240412_1515_MN39669_FAX91881_924419b0/./eccDNA_RCA_nanopore.py", line 512, in callConsensus reference = self._getReference() File "/mnt/c/data/W3_4h_T7/W3_4h_T7/20240412_1515_MN39669_FAX91881_924419b0/./eccDNA_RCA_nanopore.py", line 507, in _getReference s = self.genome[interval.chr][interval.start:interval.end+1] File "/home/abacita/miniconda3/lib/python3.10/site-packages/pyfaidx/init.py", line 1124, in getitem raise KeyError("{0} not in {1}.".format(rname, self.filename)) KeyError: '0bac1663-45c1-420a-b923-f7cbb7736d05 not in hg38.fa.'

alexbacita commented 6 months ago

Hello @icebert I have looked into the read ID causing the problem and it seems fairly normal. I have even blasted its 938 bp sequence and it maps with 98% identity to the human genome.

Can you spot anything wrong from the paf file? I thought it might be the mapping quality of 1 but I have removed those and I still get the same error. IMG_6693

icebert commented 6 months ago

Hi Alexandru, for the PAF file, the 6th column should be the chromosome name, not the read name. Only the 1st column is the read name. It seems the reads were mapped to the reads themselves, not to hg38.fa.

You may check your command for generating the hg38_index.mmi, which should use hg38.fa as input, not the reads as input.

https://github.com/lh3/miniasm/blob/master/PAF.md

alexbacita commented 6 months ago

That was it! Much appreciated, thanks!