Closed samir-watson closed 4 years ago
Hi @samir-watson, there must be some mismatch in the transcript IDs between the fasta file and the BED file. You can check what ref_ids are used in the SampCompDB (by loading it without passing a BED file) and compare those to the ref_ids used in the BED file (col 4). Do they match?
The way I usually do it is to convert the Gencode/Ensemble GTF to bed (with bedparse) and then convert the BED to fasta with bedtools getfasta and fix the fasta headers with a regex:
bedtools getfasta -fi ${genome_fasta} -s -split -name -bed reference_transcriptome.bed -fo - | perl -pe 's/>(.+)\\([+-]\\)/>\$1/'
As a reference, you can also check the code of our nextflow pipeline
Hi @tleonardi,
I have had a good look at what was happening with and it seems to a problem with pipes in the fasta headers that are being carried over from the fasta I used during SampComp so when I compare the SampCompDB database to the bed file there is no longer a match. I will try and redo the alignment with the pipes and extra IDs removed and see if that works.
On another note I found a tiny issue in SampCompDB.py line 421, it says "red_if" and I think its meand to be ref_id.
Cheers, Samir
Hi @samir-watson, thanks for spotting the typo. It's getting fixed
Hi, I am getting the following error when trying to use SampCompDB to save a bed file:
nanocompore.common.NanocomporeError: Some references are missing from the BED file provided
I am using gencode.v33.transcripts.fa for transcript alignment and I am making a bed file using the gft2bed tool using the gencode.v33.annotation.gtf file
Im not sure what the problem is here as I am using the same gencode release (v33) to download both the gtf and the fasta file.
here is my code:
here is the traceback:
Traceback (most recent call last): File "", line 5, in
File "/home/samirwatson/miniconda3/envs/nanocompore/lib/python3.7/site-packages/nanocompore/SampCompDB.py", line 108, in init
self.results = self.__calculate_results(adjust=True)
File "/home/samirwatson/miniconda3/envs/nanocompore/lib/python3.7/site-packages/nanocompore/SampCompDB.py", line 166, in __calculate_results
raise NanocomporeError("Some references are missing from the BED file provided")
nanocompore.common.NanocomporeError: Some references are missing from the BED file provided