Open vnvkotova opened 6 years ago
I think it would help to debug if you pasted here the header in the FASTA file for that transcript. Could be that the code that parses the FASTA file does not recognize the syntax.
Hi, I just tested the command you posted, it works well in my case:
pyensembl install --release 93 --species human
python3
>>> from pyensembl import EnsemblRelease
>>> data = EnsemblRelease(93)
>>> print(data.transcript_by_id('ENST00000624155').sequence)
INFO:pyensembl.sequence_data:Loaded sequence dictionary from /home/*****/.cache/pyensembl/GRCh38/ensembl93/Homo_sapiens.GRCh38.cdna.all.fa.gz.pickle
_data:Loaded sequence dictionary from /home/*****/.cache/pyensembl/GRCh38/ensembl93/Homo_sapiens.GRCh38.ncrna.fa.gz.pickle
ATGGTGGTGGCAACAGAGATGGCAGCGCGGCTGGAGTGTTAGGAGGGTGGCCTGAGCAGTAGGATTGGGGCTGGAGCAGTAAGATGGCAGCCGGAGCGGTTTTTCTGGCATTGTCTGCCCAGCTGCTCCAAGCCAGACTGATGAAGGAGGAGTCCCCAGTGGTGAGCTGGAGGTTGGAGCCTGAAGATGGCACAGCTCTGTGATTCATCTTCTGCGGTTGTGGCAGCCACGGTGATGGAGACGGCAGCTCAACAGGAGCAATAGGAGGGTACCCATGGAGGCCAAGTG
Dear pyensembl developers,
I want to get a sequence for a given transcript from a FASTA file. I used the following code to define and initiate my genome:
data = pyensembl.Genome(reference_name='hg38', annotation_name='hg38_chr22', gtf_path_or_url='...GRCh38.83.gtf', transcript_fasta_path_or_url = '.../hg38.fa')
data.index()
After running it I can get the basic information about genes and transcripts that the files have, e.g. if I run:
print(data.transcript_ids(22, '+'))
it gives me a list with ids. But I can't get a sequence for a given transcript. Running this script:print(data.transcript_by_id('ENST00000624155').sequence)
gives me "None".I checked several different combinations of GTF and FASTA files. The result was the same for all of them, therefore I'm certain that the problem is not caused by the files.
I'll really appreciate your reply!
Best regards, Nika