Closed alexbacita closed 6 months ago
Hi Alexandru,
Yes, unmapped reads should not be included. If you use minimap2 for the mapping, by default, minimap2 does not output unmapped reads in paf format. If you use other mappers, you need to remove the unmapped reads first.
Thanks! I've used: minimap2 -x map-ont -c --secondary=no -t 18 hg38_index.mmi merged_pass.fastq.gz > alignment.paf
However, my merged_pass.fastq.gz file is merged from two different sequencing runs of the same sample. Wondering if that has anything to do with it. I'll try some troubleshooting but please let me know if you spot anything wrong in the above command.
The minimap2 command is correct. I guess it's because of running eccDNA_RCA_nanopore.py --fastq should be the reads fastq file instead of the hg38.fa
eccDNA_RCA_nanopore.py command is the following: ./eccDNA_RCA_nanopore.py --fastq merged_pass.fastq --paf alignment.paf --info info.tsv --seq seq.fa --var var.tsv --reference hg38.fa
Tried it on test files/other datasets and worked fine. Problem seems to be input data I guess - thanks for the quick responses, much appreciated!
Full error below - will come back if/when fixed but any other ideas please let me know
Traceback (most recent call last): File "/home/abacita/miniconda3/lib/python3.10/site-packages/pyfaidx/init.py", line 1122, in getitem return self.records[rname] KeyError: '0bac1663-45c1-420a-b923-f7cbb7736d05'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/c/data/W3_4h_T7/W3_4h_T7/20240412_1515_MN39669_FAX91881_924419b0/./eccDNA_RCA_nanopore.py", line 577, in
Hello @icebert I have looked into the read ID causing the problem and it seems fairly normal. I have even blasted its 938 bp sequence and it maps with 98% identity to the human genome.
Can you spot anything wrong from the paf file? I thought it might be the mapping quality of 1 but I have removed those and I still get the same error.
Hi Alexandru, for the PAF file, the 6th column should be the chromosome name, not the read name. Only the 1st column is the read name. It seems the reads were mapped to the reads themselves, not to hg38.fa.
You may check your command for generating the hg38_index.mmi, which should use hg38.fa as input, not the reads as input.
That was it! Much appreciated, thanks!
Is this basically saying that it cannot deal with unmapped reads? How can I fix this? Thanks!
Traceback (most recent call last): File "/mnt/c/linuxdata/eccDNA_RCA_nanopore/./eccDNA_RCA_nanopore.py", line 577, in
cs = ConsensusSequence(fastq, genome, read).callConsensus()
File "/mnt/c/linuxdata/eccDNA_RCA_nanopore/./eccDNA_RCA_nanopore.py", line 512, in callConsensus
reference = self._getReference()
File "/mnt/c/linuxdata/eccDNA_RCA_nanopore/./eccDNA_RCA_nanopore.py", line 507, in _getReference
s = self.genome[interval.chr][interval.start:interval.end+1]
File "/home/abacita/miniconda3/lib/python3.10/site-packages/pyfaidx/init.py", line 1124, in getitem
raise KeyError("{0} not in {1}.".format(rname, self.filename))
KeyError: '0bac1663-45c1-420a-b923-f7cbb7736d05 not in hg38.fa.'